Re: Improving btree performance through specializing by key shape, take 2
От | Matthias van de Meent |
---|---|
Тема | Re: Improving btree performance through specializing by key shape, take 2 |
Дата | |
Msg-id | CAEze2Wi8=4FZHSn0e8orOjEmj01Ugf4heASUGKcnMURPEDwn8A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improving btree performance through specializing by key shape, take 2 (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Improving btree performance through specializing by key shape, take 2
(Peter Geoghegan <pg@bowt.ie>)
|
Список | pgsql-hackers |
On Tue, 19 Sept 2023 at 03:56, Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Sep 18, 2023 at 6:29 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I also have significant doubts about your scheme for avoiding > > invalidating the bounds of the page based on its high key matching the > > parent's separator. The subtle dynamic prefix compression race > > condition that I was worried about was one caused by page deletion. > > But page deletion doesn't change the high key at all (it does that for > > the deleted page, but that's hardly relevant). So how could checking > > the high key possibly help? > > To be clear, page deletion does what I described here (it does an > in-place update of the downlink to the deleted page, so the same pivot > tuple now points to its right sibling, which is our page of concern), > in addition to fully removing the original pivot tuple whose downlink > originally pointed to our page of concern. This is why page deletion > makes the key space "move to the right", very much like a page split > would. I am still aware of this issue, and I think we've discussed it in detail earlier. I think it does not really impact this patchset. Sure, I can't use dynamic prefix compression to its full potential, but I still do get serious performance benefits: FULL KEY _bt_compare calls: 'Optimal' full-tree DPT: average O(3) Paged DPT (this patch): average O(2 * height) ... without HK opt: average O(3 * height) Current: O(log2(n)) Single-attribute compares: 'Optimal' full-tree DPT: O(log(N)) Paged DPT (this patch): O(log(N)) Current: 0 (or, O(log(N) * natts)) So, in effect, this patch moves most compare operations to the level of only one or two full key compare operations per page (on average). I use "on average": on a sorted array with values ranging from potentially minus infinity to positive infinity, it takes on average 3 compares before a binary search can determine the bounds of the keyspace it has still to search. If one side's bounds is already known, it takes on average 2 compare operations before these bounds are known. > IMV it would be better if it made the key space "move to the left" > instead, which would make page deletion close to the exact opposite of > a page split -- that's what the Lanin & Shasha paper does (sort of). > If you have this symmetry, then things like dynamic prefix compression > are a lot simpler. > > ISTM that the only way that a scheme like yours could work, assuming > that making page deletion closer to Lanin & Shasha is not going to > happen, is something even more invasive than that: it might work if > you had a page low key (not just a high key) on every page. Note that the "dynamic prefix compression" is currently only active on the page level. True, the patch does carry over _bt_compare's prefix result for the high key on the child page, but we do that only if the highkey is actually an exact copy of the right separator on the parent page. This carry-over opportunity is extremely likely to happen, because the high key generated in _bt_split is then later inserted on the parent page. The only case where it could differ is in concurrent page deletions. That is thus a case of betting a few cycles to commonly save many cycles (memcmp vs _bt_compare full key compare. Again, we do not actually skip a prefix on the compare call of the P_HIGHKEY tuple, nor for the compares of the midpoints unless we've found a tuple on the page that compares as smaller than the search key. > You'd have > to compare the lower bound separator key from the parent (which might > itself be the page-level low key for the parent) to the page low key. > That's not a serious suggestion; I'm just pointing out that you need > to be able to compare like with like for a canary condition like this > one, and AFAICT there is no lightweight practical way of doing that > that is 100% robust. True, if we had consistent LOWKEYs on pages, that'd make this job much easier: the prefix could indeed be carried over in full. But that's not currently the case for the nbtree code, and this is the next best thing, as it also has the benefit of working with all currently supported physical formats of btree indexes. Kind regards, Matthias van de Meent
В списке pgsql-hackers по дате отправления: