Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
От | Peter Geoghegan |
---|---|
Тема | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
Дата | |
Msg-id | CAH2-WzkfK=JVHjkd17TLDvsFb6psduTt5WYiT8dg+-UFc+rSSQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
(Peter Geoghegan <pg@bowt.ie>)
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Список | pgsql-hackers |
Attached is v5, which significantly simplifies the _bt_findsplitloc() logic. It's now a great deal easier to follow. It would be helpful if someone could do code-level review of the overhauled _bt_findsplitloc(). That's the most important part of the patch. It involves relatively subjective trade-offs around total effort spent during a page split, space utilization, and avoiding "false sharing" (I call the situation where a range of duplicate values straddle two leaf pages unnecessarily "false sharing", since it necessitates that subsequent index scans visit two index scans rather than just one, even when that's avoidable.) This version has slightly improved performance, especially for cases where an index gets bloated without any garbage being generated. With the UK land registry data [1], an index on (county, city, locality) is shrunk by just over 18% by the new logic (I recall that it was shrunk by ~15% in an earlier version). In concrete terms, it goes from being 1.288 GiB on master to being 1.054 GiB with v5 of the patch. This is mostly because the patch intelligently packs together duplicate-filled pages tightly (in particular, it avoids "getting tired"), but also because it makes pivots less restrictive about where leaf tuples can go. I still manage to shrink the largest TPC-C and TPC-H indexes by at least 5% following an initial load performed by successive INSERTs. Those are unique indexes, so the benefits are certainly not limited to cases involving many duplicates. 3 modes ------- My new approach is to teach _bt_findsplitloc() 3 distinct modes of operation: Regular/default mode, many duplicates mode, and single value mode. The higher level split code always asks for a default mode call to _bt_findsplitloc(), so that's always where we start. For leaf page splits, _bt_findsplitloc() will occasionally call itself recursively in either many duplicates mode or single value mode. This happens when the default strategy doesn't work out. * Default mode almost does what we do already, but remembers the top n candidate split points, sorted by the delta between left and right post-split free space, rather than just looking for the overall lowest delta split point. Then, we go through a second pass over the temp array of "acceptable" split points, that considers the needs of suffix truncation. * Many duplicates mode is used when we fail to find a "distinguishing" split point in regular mode, but have determined that it's possible to get one if a new, exhaustive search is performed. We go to great lengths to avoid having to append a heap TID to the new left page high key -- that's what I mean by "distinguishing". We're particularly concerned with false sharing by subsequent point lookup index scans here. * Single value mode is used when we see that even many duplicates mode would be futile, as the leaf page is already *entirely* full of logical duplicates. Single value mode isn't exhaustive, since there is clearly nothing to exhaustively search for. Instead, it packs together as many tuples as possible on the right side of the split. Since heap TIDs sort in descending order, this is very much like a "leftmost" split that tries to free most of the space on the left side, and pack most of the page contents on the right side. Except that it's leftmost, and in particular is leftmost among pages full of logical duplicates (as opposed to being leftmost/rightmost among pages on an entire level of the tree, as with the traditional rightmost 90:10 split thing). Other changes ------------- * I now explicitly use fillfactor in the manner of a rightmost split to get the single value mode behavior. I call these types of splits (rightmost and single value mode splits) "weighted" splits in the patch. This is much more consistent with our existing conventions than my previous approach. * Improved approached to inexpensively determining how effective suffix truncation will be for a given candidate split point. I no longer naively probe the contents of index tuples to do char comparisons. Instead, I use a tuple descriptor to get offsets to each attribute in each tuple in turn, then calling to datumIsEqual() to determine if they're equal. This is almost as good as a full scan key comparison. This actually seems to be a bit faster, and also takes care of INCLUDE indexes without special care (no need to worry about probing non-key attributes, and reaching a faulty conclusion about which split point helps with suffix truncation). I still haven't managed to add pg_upgrade support, but that's my next step. I am more or less happy with the substance of the patch in v5, and feel that I can now work backwards towards figuring out the best way to deal with on-disk compatibility. It shouldn't be too hard -- most of the effort will involve coming up with a good test suite. [1] https://wiki.postgresql.org/wiki/Sample_Databases -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: