Re: Block level parallel vacuum WIP
От | Masahiko Sawada |
---|---|
Тема | Re: Block level parallel vacuum WIP |
Дата | |
Msg-id | CAD21AoDn6YUya9ar0=s92Li9N=Zmiq+dhWtkD8UuEOV3xLn8gw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Block level parallel vacuum WIP (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Tue, Aug 23, 2016 at 10:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> I'd like to propose block level parallel VACUUM. >> This feature makes VACUUM possible to use multiple CPU cores. > > Great. This is something that I have thought about, too. Andres and > Heikki recommended it as a project to me a few PGCons ago. > >> As for PoC, I implemented parallel vacuum so that each worker >> processes both 1 and 2 phases for particular block range. >> Suppose we vacuum 1000 blocks table with 4 workers, each worker >> processes 250 consecutive blocks in phase 1 and then reclaims dead >> tuples from heap and indexes (phase 2). >> To use visibility map efficiency, each worker scan particular block >> range of relation and collect dead tuple locations. >> After each worker finished task, the leader process gathers these >> vacuum statistics information and update relfrozenxid if possible. > > This doesn't seem like a good design, because it adds a lot of extra > index scanning work. What I think you should do is: > > 1. Use a parallel heap scan (heap_beginscan_parallel) to let all > workers scan in parallel. Allocate a DSM segment to store the control > structure for this parallel scan plus an array for the dead tuple IDs > and a lock to protect the array. > > 2. When you finish the heap scan, or when the array of dead tuple IDs > is full (or very nearly full?), perform a cycle of index vacuuming. > For now, have each worker process a separate index; extra workers just > wait. Perhaps use the condition variable patch that I posted > previously to make the workers wait. Then resume the parallel heap > scan, if not yet done. > > Later, we can try to see if there's a way to have multiple workers > work together to vacuum a single index. But the above seems like a > good place to start. Thank you for the advice. That's a what I thought as an another design, I will change the patch to this design. >> I also changed the buffer lock infrastructure so that multiple >> processes can wait for cleanup lock on a buffer. > > You won't need this if you proceed as above, which is probably a good thing. Right. > >> And the new GUC parameter vacuum_parallel_workers controls the number >> of vacuum workers. > > I suspect that for autovacuum there is little reason to use parallel > vacuum, since most of the time we are trying to slow vacuum down, not > speed it up. I'd be inclined, for starters, to just add a PARALLEL > option to the VACUUM command, for when people want to speed up > parallel vacuums. Perhaps > > VACUUM (PARALLEL 4) relation; > > ...could mean to vacuum the relation with the given number of workers, and: > > VACUUM (PARALLEL) relation; > > ...could mean to vacuum the relation in parallel with the system > choosing the number of workers - 1 worker per index is probably a good > starting formula, though it might need some refinement. It looks convenient. I was thinking that we can manage the number of parallel worker per table using this parameter for autovacuum , like ALTER TABLE relation SET (parallel_vacuum_workers = 2) Regards, -- Masahiko Sawada
В списке pgsql-hackers по дате отправления: