Re: Turning off HOT/Cleanup sometimes
От | Greg Stark |
---|---|
Тема | Re: Turning off HOT/Cleanup sometimes |
Дата | |
Msg-id | CAM-w4HNs9+yQ4sLZ3xYkB8ODXRH_qM3O_O4SUzq5MdRb2OYyNA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Turning off HOT/Cleanup sometimes (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: Turning off HOT/Cleanup sometimes
Re: Turning off HOT/Cleanup sometimes |
Список | pgsql-hackers |
On Mon, Apr 20, 2015 at 8:48 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> But if the entire table is very hot, I think that that is just another of way >> of saying that autovacuum is horribly misconfigured. I think the purpose of > > Well, we have to assume there are many misconfigured configurations --- > autovacuum isn't super-easy to configure, so we can't just blame the > user if this makes things worse. In fact, page pruning was designed > spefically for cases where autovacuum wasn't running our couldn't keep > up. Well autovacuum isn't currently considering HOT pruning part of its job at all. It's hard to call it "misconfigured" when there's literally *no* way to configure it "correctly". If you update less than autovacuum_vacuum_scale_factor fraction of the table and then never update another row autovacuum will never run. Ever. Every select will forevermore need to follow hot chains on that table. Until eventually transaction wraparound forces a vacuum on that table if that ever happens. Possibly autovacuum could be adjusted to count how many selects are happening on the table and decide to vacuum it when the cost of the selects following the dead tuples is balanced by the cost of doing a vacuum. But that's not something included in the design of autovacuum today. The original design of tuple storage was aimed at optimizing the steady state where most tuples were not recently updated. It guaranteed that except for tuples that were in the process of being updated or were recently updated a tuple read didn't have to read the CLOG, didn't have to follow any chains, didn't have to do any I/O or other work other than to read the bits on the tuple itself. When a tuple is updated it's put into a state where everyone who comes along has to do extra work but as soon as practical the hint bits get set and that extra work stops. We had similar discussions about setting hint bits in the past. I'm not sure why HOT pruning is the focus now because I actually think hint bit setting is a larger source of I/O in innocent looking selects even today. And it's a major headache, people are always being surprised that their selects cause lots of I/O and slow down dramatically after a big update or data load has finished. It's characterized as "why is the database writing everything twice" (and saying it's actually writing everything three times doesn't make people feel better). In the new age of checksums with hint bit logging I wonder if it's even a bigger issue. It occurs to me that generating these dirty pages isn't really that expensive individually. It's only that there's a sudden influx of a large number of dirty pages that causes them to get translated immediately into filesystem I/O. Perhaps we should dirty pages on hint bit updates and do HOT pruning only to the extent it can be done without causing I/O. Of course it's hard to tell that in advance but maybe something like "if the current buffer had to be fetched and caused a dirty buffer to be evicted then skip hot pruning and don't dirty it for any hint bit updates" would at least mean that once the select fills up its share of buffers with dirty buffers it stops dirtying more. It would dirty pages only as fast as bgwriter or checkpoints manage to write them out. That sounds a bit weird but I think the right solution should have that combination of properties. It should guarantee that hint bits get set and hot chains pruned within some length of time but that no one select causes a storm of dirty buffers that then need to be flushed to disk. -- greg
В списке pgsql-hackers по дате отправления: