Proposal: Another attempt at vacuum improvements
От | Pavan Deolasee |
---|---|
Тема | Proposal: Another attempt at vacuum improvements |
Дата | |
Msg-id | BANLkTimiRMwUabpZXk+J3gh5QCLy0qVzVg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Proposal: Another attempt at vacuum improvements
Re: Proposal: Another attempt at vacuum improvements |
Список | pgsql-hackers |
Hi All,
Some of the ideas regarding vacuum improvements were discussed here:
A recent thread was started by Robert Haas, but I don't know if we logically concluded that either.
This was once again brought up by Robert Haas in a discussion with Tom and me during the PGCon and we agreed there are few things we can do make vacuum more performant. One of the things that Tom mentioned is that the vacuum today is not aware of the fact that its a periodic operation and there might be ways to utilize that in some way.
So the idea is to separate the index vacuum (removing index pointers to dead tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning or using regular vacuum), we can spool the dead line pointers somewhere. To avoid any hot-spots during normal processing, the spooling can be done periodically like the stats collection. One obvious choice for spooling dead line pointers is to use a relation fork. The index vacuum will be kicked off periodically depending on the number of spooled deal line pointers. When that happens, the index vacuum will remove all index pointers pointing to those dead line pointers and forget the spooled line pointers.
The dead line pointers themselves will be removed whenever a heap page is later vacuumed, either as part of HOT pruning or the next heap vacuum. We would need some mechanism though to know that the index pointers to the existing dead line pointers have been vacuumed and its safe to remove them now. May be we can track the last operation that generated a dead line pointer in the page using a LSN in the page header and also keep track of the LSN of the last successful index vacuum. If the index vacuum LSN is greater than the page header vacuum LSN, we can safely remove the existing dead line pointers. I am deliberately not suggesting how to track the index vacuum LSN since my last proposal to do something similar through a pg_class column was shot down by Tom :-)
In nutshell, what I am suggesting is to do heap and index vacuuming independently. The heap will be vacuumed either by HOT pruning or a periodic heap vacuum and the dead line pointers will be collected. An index vacuum will remove the index pointers to those dead line pointers. And at some later point, the dead line pointers will be removed, either as part of retail or complete heap vacuum. Its not clear if its useful, but a single index vacuum can follow multiple heap vacuums or vice versa.
Another advantage of this technique would be that we can then support start/stop heap vacuum or vacuuming a range of blocks at a time or even vacuuming only those blocks which are already cached in the buffer cache. Just a hand-waving at this point, but seems possible.
Suggestions/comments/criticism all welcome, but please don't shoot down the idea on implementation details since I have really not spent time on that, so it will be easy find holes and corner cases. That can be worked out if we believe something like this will be useful.
Thanks,
Pavan
В списке pgsql-hackers по дате отправления: