Обсуждение: Switching to 64-bit Bitmapsets
As many of you will know, part of my planned work for PG12 is to further improve the performance of querying partitioned tables. Amit Langote is doing quite a bit of work on the planner side of things to remove the per-partition overhead to reduce that to just for partitions that survive partition pruning. I've mostly turned my attention to tuning the executor. Back in PG11, we got faster partition pruning and also run-time partition pruning. The pruning code uses Bitmapsets to communicate which partitions survived partition pruning. When many partitions exist, these Bitmapsets can become quite large, quite a bit larger than Bitmapsets were designed to be. A comment in bitmapset.c mentions: * A bitmap set can represent any set of nonnegative integers, although * it is mainly intended for sets where the maximum value is not large, * say at most a few hundred. I imagine the reason for this comment is that we must allocate storage space for all bits below the more significant set bit. 64-bit sets don't change that, but they do speed up various operations that are commonly performed on Bitmapsets: * Reduces the number of memory allocations by half for incrementally built sets. * Faster skipping of 0 words. This improves bms_num_members, bms_next_member, bms_prev_member for sparsely populated sets. * Sets can be copied more quickly with bms_copy() * Improvements to various other operations; bms_overlap, bms_nonempty_difference, bms_singleton_member, to name a few. If I patch master with [1] to delay the locking of partitions until after run-time pruning, then I'm able to see a small gain in performance with a 10k partition partitioned table. Currently, the improvement is only around 1%, but at the moment the TPS of the query is just around 900 tps. I have local patches which push this to around 25k tps, so that 1% overhead becomes much larger in that case. I tried to design a benchmark that exercises a large Bitmapset. The 32-bit version (master) will use 313 bitmapwords to store this and the patched version uses just 157 words. create table listp (a int) partition by list(a); select 'create table listp'||x::text || ' partition of listp for values in('||x::text||');' from generate_Series(1,10000)x; \gexec select.sql: \set p_a 5000 select * from listp where a = :p_a and a > (select 0); Patched: $ pgbench -n -f select.sql -T 60 -M prepared postgres tps 901.570528 976.756117 906.503725 932.156778 942.404224 960.889453 962.468714 907.756816 940.468028 906.351704 978.957677 Unpatched: $ pgbench -n -f select.sql -T 60 -M prepared postgres tps 903.4159 941.853886 913.956495 949.313422 907.267227 964.118809 941.43603 952.701788 916.757846 929.572794 891.220856 917.235864 Average increase = 0.89%, median = 1.17% I've attached the trivial patch which implements 64-bit Bitmapsets. One caveat about this may be that it will likely slow performance for 32-bit machines. My current thinking about that is that such a platform is likely not that common a target for the latest version of PostgreSQL. Even mobile phones and Raspberry PIs are all 64-bit these days. However, I doubt it would take much more effort to maintain using 32-bit sets on 32-bit machines. If someone feels strongly about that then I can adjust the patch to allow that. I'm going to add this to the January commitfest. [1] https://commitfest.postgresql.org/21/1897/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > I've attached the trivial patch which implements 64-bit Bitmapsets. > One caveat about this may be that it will likely slow performance for > 32-bit machines. My current thinking about that is that such a > platform is likely not that common a target for the latest version of > PostgreSQL. Yeah, I think we've been optimizing for 64-bit platforms for awhile. I'm not prepared to abandon "it works on 32-bit", but I don't see a reason to sacrifice 64-bit performance to improve the 32-bit case. > However, I doubt it would take much more effort to maintain > using 32-bit sets on 32-bit machines. If someone feels strongly about > that then I can adjust the patch to allow that. Hm, are you thinking of making BITS_PER_BITMAPWORD match sizeof(Pointer) or something like that? That seems like a good compromise from here. regards, tom lane
On Thu, 20 Dec 2018 at 17:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > > However, I doubt it would take much more effort to maintain > > using 32-bit sets on 32-bit machines. If someone feels strongly about > > that then I can adjust the patch to allow that. > > Hm, are you thinking of making BITS_PER_BITMAPWORD match sizeof(Pointer) > or something like that? That seems like a good compromise from here. Yeah, something along those lines. I've implemented that in the attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > On Thu, 20 Dec 2018 at 17:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hm, are you thinking of making BITS_PER_BITMAPWORD match sizeof(Pointer) >> or something like that? That seems like a good compromise from here. > Yeah, something along those lines. I've implemented that in the attached. Pushed with some fiddling with the comment. I wasn't excited about the test case you offered --- on HEAD, it pretty much all devolves to file access operations (probably, checking the current length of all the child relations). Instead I experimented with an old test case I had for a complex-to-plan query, and got these results: HEAD, asserts off: pgbench reports: latency average = 11704.992 ms tps = 0.085434 (including connections establishing) tps = 0.085437 (excluding connections establishing) Top hits according to perf: + 65.42% 65.16% 158499 postmaster postgres [.] match_eclasses_to_foreign_key_col + 6.85% 6.82% 16634 postmaster postgres [.] bms_overlap + 6.25% 6.22% 15173 postmaster postgres [.] generate_join_implied_equalities_for_ecs + 3.91% 3.88% 9465 postmaster postgres [.] make_canonical_pathkey + 3.04% 3.01% 7351 postmaster postgres [.] have_relevant_eclass_joinclause + 2.06% 2.05% 4992 postmaster postgres [.] bms_is_subset + 1.19% 1.18% 2887 postmaster postgres [.] equal + 0.95% 0.95% 2309 postmaster postgres [.] get_eclass_for_sort_expr + 0.90% 0.90% 2189 postmaster postgres [.] add_paths_to_joinrel With patch: latency average = 10741.595 ms tps = 0.093096 (including connections establishing) tps = 0.093099 (excluding connections establishing) + 69.03% 68.76% 178278 postmaster postgres [.] match_eclasses_to_foreign_key_col + 5.85% 5.82% 15138 postmaster postgres [.] generate_join_implied_equalities_for_ecs + 4.58% 4.55% 11842 postmaster postgres [.] bms_overlap + 4.38% 4.36% 11338 postmaster postgres [.] make_canonical_pathkey + 2.77% 2.75% 7155 postmaster postgres [.] have_relevant_eclass_joinclause + 1.30% 1.29% 3364 postmaster postgres [.] equal + 1.26% 1.25% 3261 postmaster postgres [.] bms_is_subset + 1.06% 1.05% 2722 postmaster postgres [.] get_eclass_for_sort_expr + 0.70% 0.70% 1813 postmaster postgres [.] add_paths_to_joinrel Ignoring the, um, elephant in the room, this shows a pretty clear win for the performance of bms_overlap and bms_is_subset, confirming the thesis that this change is worthwhile. I'll start a separate thread about match_eclasses_to_foreign_key_col. regards, tom lane
On Fri, 21 Dec 2018 at 06:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pushed with some fiddling with the comment. Great. Thanks! > I wasn't excited about the test case you offered --- on HEAD, it pretty > much all devolves to file access operations (probably, checking the > current length of all the child relations). Instead I experimented > with an old test case I had for a complex-to-plan query, and got these > results: oh meh. I forgot to mention I was running with plan_cache_mode = force_generic_plan and max_parallel_workers_per_gather = 0. If you tried without those then you'd have seen a massively different result. Your test seems better, regardless. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services