Re: B-tree parent pointer and checkpoints
От | Bruce Momjian |
---|---|
Тема | Re: B-tree parent pointer and checkpoints |
Дата | |
Msg-id | 201103102050.p2AKoTr03570@momjian.us обсуждение исходный текст |
Ответ на | Re: B-tree parent pointer and checkpoints (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: B-tree parent pointer and checkpoints
(Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
|
Список | pgsql-hackers |
Bruce Momjian wrote: > > Has this been addressed? I see we have with this commit: 9de3aa65f01fb51cbc725e8508ea233e4e92c46c --------------------------------------------------------------------------- > > --------------------------------------------------------------------------- > > Heikki Linnakangas wrote: > > On 13.11.2010 00:34, Greg Stark wrote: > > > On Fri, Nov 12, 2010 at 7:20 PM, Heikki Linnakangas > > > <heikki.linnakangas@enterprisedb.com> wrote: > > >> I think we can work around that with a small modification to the page split > > >> algorithm. In a nutshell, when the child page is split, put a flag on the > > >> left half indicating that the rightlink must always be followed, regardless > > >> of the NSN. When the downlink is inserted to the parent, clear the flag. > > >> Setting and clearing of these flags need to be performed during WAL replay > > >> as well. > > > > > > Does this not cause duplicate results? Or does GIST already have to be > > > prepared to deal with duplicate results? > > > > The GiST search algorithm avoids duplicate results by remembering the > > LSN on the parent page when it follows a downlink. The split currently > > happens like this: > > > > 0. (the child page is locked) > > 1. The parent page is locked. > > 2. The child page is split. The original page becomes the left half, and > > new buffers are allocated for the right halves. > > 3. The downlink is inserted on the parent page (and the original > > downlink is updated to reflect only the keys that stayed on the left > > page). While keeping the child pages locked, the NSN field on the > > children are updated with the new LSN of the parent page. > > > > To avoid duplicates, when a scan looks at the child page, it needs to > > know if it saw the parent page before or after the downlink was > > inserted. If it saw it before, the scan needs to follow the rightlink to > > the right half, otherwise it will follow the downlink as usual (if it > > matched). The scan checks that by comparing the LSN it saw on the parent > > page with the NSN on the child page. If parent LSN < NSN, we saw the > > parent before the downlink was inserted. > > > > Now, the problem with crash recovery is that the above algorithm depends > > on the split to keep the parent and child locked until the downlink is > > inserted in the parent. If you crash between steps 2 and 3, the locks > > are gone. If a later insert then updates the parent page, because of a > > split on some unrelated child page, that will bump the LSN of the parent > > above the NSN on the child. Scans will see that the parent LSN > child > > NSN, and will no longer follow the rightlink. > > > > And the fix for that is to set a flag on the child page indicating that > > rightlink has to be always followed regardless of the LSN/NSN, because > > the downlink hasn't been inserted yet. When the downlink is inserted, > > the flag is cleared and we rely on the existing LSN/NSN mechanism to > > avoid duplicate results. > > > > -- > > Heikki Linnakangas > > EnterpriseDB http://www.enterprisedb.com > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
В списке pgsql-hackers по дате отправления: