Обсуждение: increasing collapse_limits?
Hello Actually we had to solve a issue with slow SELECT. The problem was in low value of JOIN_COLLAPSE_LIMITS. Can we increase a default of this value. I checked some complex query, and planner needed about 200ms for JOIN_COLLAPSE_LIMIT = 16. So some around 12 can be well. Regards Pavel Stehule
Pavel Stehule <pavel.stehule@gmail.com> writes: > Actually we had to solve a issue with slow SELECT. The problem was in > low value of JOIN_COLLAPSE_LIMITS. Can we increase a default of this > value. I checked some complex query, and planner needed about 200ms > for JOIN_COLLAPSE_LIMIT = 16. So some around 12 can be well. I'd like to see a rather larger survey of cases before changing that. Also, amount of memory consumed is at least as large a concern here as runtime. regards, tom lane
On Apr 30, 2011, at 7:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> Actually we had to solve a issue with slow SELECT. The problem was in >> low value of JOIN_COLLAPSE_LIMITS. Can we increase a default of this >> value. I checked some complex query, and planner needed about 200ms >> for JOIN_COLLAPSE_LIMIT = 16. So some around 12 can be well. > > I'd like to see a rather larger survey of cases before changing that. > Also, amount of memory consumed is at least as large a concern here > as runtime. I seem to remember that I was the last one to suggest raising these limits and someone demonstrated rather convincingly thatfor certain classes of queries that would cause really big problems. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I seem to remember that I was the last one to suggest raising these limits and someone demonstrated rather convincinglythat for certain classes of queries that would cause really big problems. You proposed removing the collapse limits altogether, but that crashed and burned pretty quickly --- see the archives from 2009, eg here http://archives.postgresql.org/pgsql-hackers/2009-07/msg00358.php http://archives.postgresql.org/pgsql-hackers/2009-07/msg00947.php http://archives.postgresql.org/pgsql-hackers/2009-11/msg00306.php I'm not opposed to raising the limits somewhat, but I'd like to see a more thorough case made for what to raise them to. In principle there are k! join orders for a k-way join problem, which means that raising the limit from 8 to 12 could result in a 10000-fold increase in planner runtime and memory consumption. In practice, because of the heuristic that we avoid considering clauseless joins if possible, most queries don't see growth rates that bad --- it would require a query in which every relation is linked to every other relation by a join clause. But that *can* happen (remember that clauses generated by transitive equality do count). So there needs to be some attention paid to both average and worst case behaviors. Raising them to 10 would only impose a worst case 100-fold growth, which is not as scary as 10000-fold, so maybe we should consider that as an intermediate step. Don't know how much difference that would make in the real world though. It also occurs to me to wonder if we could adjust the limit on-the-fly based on noticing whether or not the query is prone to worst-case behavior, ie how dense is the join connection graph. Right now it'd be difficult to do that with any reliability, though, because we don't look for equivalence classes until after we've fixed our attention on a particular join subproblem. regards, tom lane
On Sat, Apr 30, 2011 at 9:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > - it would require a query in which > every relation is linked to every other relation by a join clause. > But that *can* happen (remember that clauses generated by transitive > equality do count). It sounds like you're describing precisely a "star schema" join which isn't an uncommon design pattern at all. -- greg
On 01/05/11 11:53, Greg Stark wrote: <blockquote cite="mid:BANLkTikP66KizPDXsvqCtibYLCfXxS7yBw@mail.gmail.com" type="cite"><prewrap="">On Sat, Apr 30, 2011 at 9:21 PM, Tom Lane <a class="moz-txt-link-rfc2396E" href="mailto:tgl@sss.pgh.pa.us"><tgl@sss.pgh.pa.us></a>wrote: </pre><blockquote type="cite"><pre wrap="">- it would require a query in which every relation is linked to every other relation by a join clause. But that *can* happen (remember that clauses generated by transitive equality do count). </pre></blockquote><pre wrap=""> It sounds like you're describing precisely a "star schema" join which isn't an uncommon design pattern at all. </pre></blockquote><font size="-1"><font face="Helvetica"><br /> Nice example here:<br /><br /><a class="moz-txt-link-freetext" href="http://archives.postgresql.org/pgsql-bugs/2011-04/msg00100.php">http://archives.postgresql.org/pgsql-bugs/2011-04/msg00100.php</a><br /><br/> Strictly only a 'star-like' query as the foreign key references go the opposite way from a true star. However itillustrates the planner memory growth well (1.1G on 32-bit 1.7G on 64-bit systems).<br /><br /> A point I didn't mentionis that the memory use is quite dependent on the choice of "word" values for the "AND keyword = 'word'" clause - thetext example had 6 all the same. Setting them all different (even after adjusting the data so the there *was* a numberof matching rows to find) resulted in significantly less memory consumed (I can dig up some examples if it might beinteresting).<br /><br /> Cheers<br /><br /> Mark<br /></font></font>
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > On 01/05/11 11:53, Greg Stark wrote: >> On Sat, Apr 30, 2011 at 9:21 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> - it would require a query in which >>> every relation is linked to every other relation by a join clause. >>> But that *can* happen (remember that clauses generated by transitive >>> equality do count). >> It sounds like you're describing precisely a "star schema" join which >> isn't an uncommon design pattern at all. A normal star schema doesn't really do this because the join conditions are generally on different columns of the central fact table. However... > Nice example here: > http://archives.postgresql.org/pgsql-bugs/2011-04/msg00100.php > Strictly only a 'star-like' query as the foreign key references go the > opposite way from a true star. However it illustrates the planner memory > growth well (1.1G on 32-bit 1.7G on 64-bit systems). > A point I didn't mention is that the memory use is quite dependent on > the choice of "word" values for the "AND keyword = 'word'" clause - the > text example had 6 all the same. Setting them all different (even after > adjusting the data so the there *was* a number of matching rows to find) > resulted in significantly less memory consumed (I can dig up some > examples if it might be interesting). Yeah. What you have there is that n.nodeid is equated to columns of six other tables, so those seven tables form a group in which every table can be joined directly to every other (because of transitive deduction of equality clauses). So it's kinda bad already. But then, if the kwN tables have "keyword" all equated to the same constant (and thus to each other), that's another group of six tables that can all be joined directly to each other. So that results in a large increase in the number of join sequences that will get explored. regards, tom lane
Hello a slow query is just simple like SELECT FROM a LEFT JOIN b ON .. LEFT JOIN c ON .. LEFT JOIN d ON .. LEFT JOIN e ON .. WHERE e.x = number a slow query plan explain analyze select * from v_vypis_parcel_puvodni where par_id = 1396907206 --------------------------- "Nested Loop Left Join (cost=4043.95..12777.12 rows=1 width=415) (actual time=46813.256..47130.773 rows=1 loops=1)" " Join Filter: (budovy.id = parcely.bud_id)" " -> Nested Loop Left Join (cost=0.00..27.42 rows=1 width=262) (actual time=0.311..0.634 rows=1 loops=1)" " Join Filter: (katastr_uzemi.kod = parcely.katuze_kod)" " -> Nested Loop Left Join (cost=0.00..20.55 rows=1 width=212) (actual time=0.282..0.301 rows=1 loops=1)" " -> Nested Loop Left Join (cost=0.00..12.26 rows=1 width=208) (actual time=0.162..0.175 rows=1 loops=1)" " Join Filter: (parcely.zdpaze_kod = zdroje_parcel_ze.kod)" " -> Nested Loop Left Join (cost=0.00..11.19 rows=1 width=145) (actual time=0.148..0.159 rows=1 loops=1)" " Join Filter: (d_pozemku.kod = parcely.drupoz_kod)" " -> Nested Loop Left Join (cost=0.00..9.94 rows=1 width=140) (actual time=0.099..0.104 rows=1 loops=1)" " Join Filter: (zp_vyuziti_poz.kod = parcely.zpvypa_kod)" " -> Index Scan using par_pk on parcely (cost=0.00..8.31 rows=1 width=84) (actual time=0.037..0.040 rows=1 loops=1)" " Index Cond: (id = 1396907206::numeric)" " -> Seq Scan on zp_vyuziti_poz (cost=0.00..1.28 rows=28 width=70) (actual time=0.005..0.023 rows=28 loops=1)" " -> Seq Scan on d_pozemku (cost=0.00..1.11 rows=11 width=19) (actual time=0.023..0.033 rows=11 loops=1)" " -> Seq Scan on zdroje_parcel_ze (cost=0.00..1.03 rows=3 width=70) (actual time=0.004..0.006 rows=3 loops=1)" " -> Index Scan using tel_pk on telesa (cost=0.00..8.28 rows=1 width=15) (actual time=0.112..0.116 rows=1 loops=1)" " Index Cond: (parcely.tel_id = public.telesa.id)" " -> Seq Scan on katastr_uzemi (cost=0.00..4.72 rows=172 width=54) (actual time=0.019..0.160 rows=172 loops=1)" " -> Hash Left Join (cost=4043.95..11787.52 rows=76968 width=164) (actual time=19827.669..47069.869 rows=77117 loops=1)" " Hash Cond: (budovy.typbud_kod = t_budov.kod)" " -> Hash Left Join (cost=4042.82..10728.08 rows=76968 width=141) (actual time=19827.625..46938.954 rows=77117 loops=1)" " Hash Cond: (budovy.caobce_kod = casti_obci.kod)" " -> Hash Left Join (cost=4028.14..9827.78 rows=76968 width=46) (actual time=19826.622..46824.288 rows=77117 loops=1)" " Hash Cond: (budovy.id = casti_budov.bud_id)" " -> Hash Left Join (cost=4015.38..8850.54 rows=76968 width=33) (actual time=19825.627..46710.476 rows=76968 loops=1)" " Hash Cond: (budovy.tel_id = public.telesa.id)" " -> Seq Scan on budovy (cost=0.00..1903.68 rows=76968 width=40) (actual time=0.031..86.709 rows=76968 loops=1)" " -> Hash (cost=2214.17..2214.17 rows=103617 width=15) (actual time=19691.650..19691.650 rows=103617 loops=1)" " -> Seq Scan on telesa (cost=0.00..2214.17 rows=103617 width=15) (actual time=0.015..96.548 rows=103617 loops=1)" " -> Hash (cost=9.79..9.79 rows=238 width=28) (actual time=0.937..0.937 rows=238 loops=1)" " -> Hash Left Join (cost=1.14..9.79 rows=238 width=28) (actual time=0.104..0.699 rows=238 loops=1)" " Hash Cond: (casti_budov.typbud_kod = t_bud_ii.kod)" " -> Seq Scan on casti_budov (cost=0.00..5.38 rows=238 width=25) (actual time=0.030..0.201 rows=238 loops=1)" " -> Hash (cost=1.06..1.06 rows=6 width=17) (actual time=0.032..0.032 rows=6 loops=1)" " -> Seq Scan on t_budov t_bud_ii (cost=0.00..1.06 rows=6 width=17) (actual time=0.008..0.014 rows=6 loops=1)" " -> Hash (cost=12.20..12.20 rows=198 width=103) (actual time=0.940..0.940 rows=198 loops=1)" " -> Hash Left Join (cost=4.50..12.20 rows=198 width=103) (actual time=0.255..0.698 rows=198 loops=1)" " Hash Cond: (casti_obci.obce_kod = obce.kod)" " -> Seq Scan on casti_obci (cost=0.00..4.98 rows=198 width=58) (actual time=0.004..0.126 rows=198 loops=1)" " -> Hash (cost=3.11..3.11 rows=111 width=53) (actual time=0.206..0.206 rows=111 loops=1)" " -> Seq Scan on obce (cost=0.00..3.11 rows=111 width=53) (actual time=0.010..0.105 rows=111 loops=1)" " -> Hash (cost=1.06..1.06 rows=6 width=17) (actual time=0.019..0.019 rows=6 loops=1)" " -> Seq Scan on t_budov (cost=0.00..1.06 rows=6 width=17) (actual time=0.002..0.006 rows=6 loops=1)" "Total runtime: 47131.739 ms" a fast query plan: set enable_hashjoin to on; set work_mem to '1MB'; set JOIN_COLLAPSE_LIMIT to 12; set geqo_threshold to 12; explain analyze select * from v_vypis_parcel_puvodni where par_id = 1396907206 "Nested Loop Left Join (cost=13.90..4930.90 rows=1 width=415) (actual time=298.456..365.421 rows=1 loops=1)" " Join Filter: (katastr_uzemi.kod = parcely.katuze_kod)" " -> Nested Loop Left Join (cost=13.90..4923.96 rows=1 width=365) (actual time=298.408..365.142 rows=1 loops=1)" " -> Nested Loop Left Join (cost=13.90..4923.68 rows=1 width=320) (actual time=298.402..365.134 rows=1 loops=1)" " -> Nested Loop Left Join (cost=13.90..4923.40 rows=1 width=270) (actual time=298.396..365.127 rows=1 loops=1)" " -> Nested Loop Left Join (cost=13.90..4923.02 rows=1 width=277) (actual time=298.364..365.091 rows=1 loops=1)" " Join Filter: (d_pozemku.kod = parcely.drupoz_kod)" " -> Nested Loop Left Join (cost=13.90..4921.77 rows=1 width=272) (actual time=298.341..365.063 rows=1 loops=1)" " Join Filter: (zp_vyuziti_poz.kod = parcely.zpvypa_kod)" " -> Nested Loop Left Join (cost=13.90..4920.14 rows=1 width=216) (actual time=298.291..365.011 rows=1 loops=1)" " -> Nested Loop Left Join (cost=13.90..4911.85 rows=1 width=212) (actual time=298.260..364.977 rows=1 loops=1)" " Join Filter: (parcely.zdpaze_kod = zdroje_parcel_ze.kod)" " -> Nested Loop Left Join(cost=13.90..4910.78 rows=1 width=149) (actual time=298.236..364.953 rows=1 loops=1)" " Join Filter: (budovy.id = parcely.bud_id)" " -> Index Scan using par_pk on parcely (cost=0.00..8.31 rows=1 width=84) (actual time=0.027..0.031 rows=1 loops=1)" " Index Cond: (id = 1396907206::numeric)" " -> Hash Left Join (cost=13.90..3940.37 rows=76968 width=76) (actual time=0.873..307.146 rows=77117 loops=1)" " Hash Cond: (budovy.typbud_kod = t_budov.kod)" " -> Hash Left Join (cost=12.76..2880.92 rows=76968 width=53) (actual time=0.852..183.112 rows=77117 loops=1)" " Hash Cond: (budovy.id = casti_budov.bud_id)" " -> Seq Scan on budovy (cost=0.00..1903.68 rows=76968 width=40) (actual time=0.033..53.484 rows=76968 loops=1)" " -> Hash (cost=9.79..9.79 rows=238 width=28) (actual time=0.806..0.806 rows=238 loops=1)" " -> Hash Left Join (cost=1.14..9.79 rows=238 width=28) (actual time=0.036..0.612 rows=238 loops=1)" " Hash Cond: (casti_budov.typbud_kod = t_bud_ii.kod)" " -> Seq Scan on casti_budov (cost=0.00..5.38 rows=238 width=25) (actual time=0.002..0.159 rows=238 loops=1)" " -> Hash (cost=1.06..1.06 rows=6 width=17) (actual time=0.020..0.020 rows=6 loops=1)" " -> Seq Scan on t_budov t_bud_ii (cost=0.00..1.06 rows=6 width=17) (actual time=0.004..0.010 rows=6 loops=1)" " -> Hash (cost=1.06..1.06 rows=6 width=17) (actual time=0.013..0.013 rows=6 loops=1)" " -> Seq Scan on t_budov (cost=0.00..1.06 rows=6 width=17) (actual time=0.001..0.005 rows=6 loops=1)" " -> Seq Scan on zdroje_parcel_ze (cost=0.00..1.03 rows=3 width=70) (actual time=0.002..0.004 rows=3 loops=1)" " -> Index Scan using tel_pk on telesa (cost=0.00..8.28 rows=1 width=15) (actual time=0.021..0.022 rows=1 loops=1)" " Index Cond: (parcely.tel_id = public.telesa.id)" " -> Seq Scan on zp_vyuziti_poz (cost=0.00..1.28 rows=28 width=70) (actual time=0.003..0.020 rows=28 loops=1)" " -> Seq Scan on d_pozemku (cost=0.00..1.11 rows=11 width=19) (actual time=0.002..0.007 rows=11 loops=1)" " -> Index Scan using tel_pk on telesa (cost=0.00..0.37 rows=1 width=15) (actual time=0.026..0.028 rows=1 loops=1)" " Index Cond: (budovy.tel_id = public.telesa.id)" " -> Index Scan using caob_pk on casti_obci (cost=0.00..0.27 rows=1 width=58) (actual time=0.002..0.002 rows=0 loops=1)" " Index Cond: (casti_obci.kod = budovy.caobce_kod)" " -> Index Scan using ob_pk on obce (cost=0.00..0.27 rows=1 width=53) (actual time=0.002..0.002 rows=0 loops=1)" " Index Cond: (casti_obci.obce_kod = obce.kod)" " -> Seq Scan on katastr_uzemi (cost=0.00..4.72 rows=172 width=54) (actual time=0.003..0.104 rows=172 loops=1)" "Total runtime: 365.709 ms" ----------------------------------------------------------- /******************** *** hashjoin off *********************/ set enable_hashjoin to off; set work_mem to '1MB'; set JOIN_COLLAPSE_LIMIT to 12; set geqo_threshold to 12; explain analyze select * from v_vypis_parcel_puvodni where par_id = 1396907206 -------------------------------------------------- "Nested Loop Left Join (cost=12.13..14400.33 rows=1 width=415) (actual time=44.065..44.332 rows=1 loops=1)" " Join Filter: (katastr_uzemi.kod = parcely.katuze_kod)" " -> Nested Loop Left Join (cost=12.13..14393.39 rows=1 width=365) (actual time=44.034..44.061 rows=1 loops=1)" " -> Nested Loop Left Join (cost=12.13..14393.03 rows=1 width=320) (actual time=44.031..44.057 rows=1 loops=1)" " -> Nested Loop Left Join (cost=12.13..14392.75 rows=1 width=270) (actual time=44.027..44.051 rows=1 loops=1)" " -> Nested Loop Left Join (cost=12.13..14392.37 rows=1 width=277) (actual time=44.015..44.037 rows=1 loops=1)" " -> Nested Loop Left Join (cost=12.13..14384.07 rows=1 width=273) (actual time=44.002..44.021 rows=1 loops=1)" " Join Filter: (parcely.zdpaze_kod = zdroje_parcel_ze.kod)" " -> Nested Loop Left Join (cost=12.13..14383.00 rows=1 width=210) (actual time=43.992..44.010 rows=1 loops=1)" " Join Filter: (zp_vyuziti_poz.kod = parcely.zpvypa_kod)" " -> Merge Right Join (cost=12.13..14381.37 rows=1 width=154) (actual time=43.947..43.963 rows=1 loops=1)" " Merge Cond: (budovy.id = parcely.bud_id)" " -> Merge Left Join (cost=1.07..14179.37 rows=76968 width=76) (actual time=0.083..41.551 rows=3135 loops=1)" " Merge Cond: (budovy.id = casti_budov.bud_id)" " -> Nested Loop Left Join (cost=1.07..13892.10 rows=76968 width=43) (actual time=0.052..36.810 rows=3134 loops=1)" " Join Filter: (t_budov.kod = budovy.typbud_kod)" " -> Index Scan using bud_pk on budovy (cost=0.00..3500.36 rows=76968 width=40) (actual time=0.023..2.970 rows=3134 loops=1)" " -> Materialize (cost=1.07..1.13 rows=6 width=17) (actual time=0.001..0.004 rows=6 loops=3134)" " -> Seq Scan on t_budov (cost=0.00..1.06 rows=6 width=17) (actual time=0.006..0.012 rows=6 loops=1)" " -> Materialize (cost=0.00..91.87 rows=238 width=28) (actual time=0.025..0.044 rows=3 loops=1)" " -> Nested Loop Left Join (cost=0.00..89.49 rows=238 width=28) (actual time=0.023..0.039 rows=3 loops=1)" " -> Index Scan using i_casti_budov_budid on casti_budov (cost=0.00..22.79 rows=238 width=25) (actual time=0.010..0.012 rows=3 loops=1)" " -> Index Scan using tbud_pk on t_budov t_bud_ii (cost=0.00..0.27 rows=1 width=17) (actual time=0.005..0.005 rows=1 loops=3)" " Index Cond: (t_bud_ii.kod = casti_budov.typbud_kod)" " -> Sort (cost=9.57..9.58 rows=1 width=89) (actual time=0.050..0.051 rows=1 loops=1)" " Sort Key: parcely.bud_id" " Sort Method: quicksort Memory: 25kB" " -> Nested Loop Left Join (cost=0.00..9.56 rows=1 width=89) (actual time=0.034..0.040 rows=1 loops=1)" " Join Filter: (d_pozemku.kod = parcely.drupoz_kod)" " -> Index Scan using par_pk on parcely (cost=0.00..8.31 rows=1 width=84) (actual time=0.012..0.014 rows=1 loops=1)" " Index Cond: (id = 1396907206::numeric)" " -> Seq Scan on d_pozemku (cost=0.00..1.11 rows=11 width=19) (actual time=0.002..0.008 rows=11 loops=1)" " -> Seq Scan on zp_vyuziti_poz (cost=0.00..1.28 rows=28 width=70) (actual time=0.003..0.020 rows=28 loops=1)" " -> Seq Scan on zdroje_parcel_ze (cost=0.00..1.03 rows=3 width=70) (actual time=0.002..0.004 rows=3 loops=1)" " -> Index Scan using tel_pk on telesa (cost=0.00..8.28 rows=1 width=15) (actual time=0.010..0.011 rows=1 loops=1)" " Index Cond: (parcely.tel_id = public.telesa.id)" " -> Index Scan using tel_pk on telesa (cost=0.00..0.37 rows=1 width=15) (actual time=0.009..0.010 rows=1 loops=1)" " Index Cond: (budovy.tel_id = public.telesa.id)" " -> Index Scan using caob_pk on casti_obci (cost=0.00..0.27 rows=1 width=58) (actual time=0.001..0.001 rows=0 loops=1)" " Index Cond: (casti_obci.kod = budovy.caobce_kod)" " -> Index Scan using ob_pk on obce (cost=0.00..0.35 rows=1 width=53) (actual time=0.001..0.001 rows=0 loops=1)" " Index Cond: (casti_obci.obce_kod = obce.kod)" " -> Seq Scan on katastr_uzemi (cost=0.00..4.72 rows=172 width=54) (actual time=0.003..0.109 rows=172 loops=1)" "Total runtime: 44.593 ms" Regards Pavel Stehule 2011/4/30 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> I seem to remember that I was the last one to suggest raising these limits and someone demonstrated rather convincinglythat for certain classes of queries that would cause really big problems. > > You proposed removing the collapse limits altogether, but that crashed > and burned pretty quickly --- see the archives from 2009, eg here > http://archives.postgresql.org/pgsql-hackers/2009-07/msg00358.php > http://archives.postgresql.org/pgsql-hackers/2009-07/msg00947.php > http://archives.postgresql.org/pgsql-hackers/2009-11/msg00306.php > > I'm not opposed to raising the limits somewhat, but I'd like to see a > more thorough case made for what to raise them to. In principle there > are k! join orders for a k-way join problem, which means that raising > the limit from 8 to 12 could result in a 10000-fold increase in planner > runtime and memory consumption. In practice, because of the heuristic > that we avoid considering clauseless joins if possible, most queries > don't see growth rates that bad --- it would require a query in which > every relation is linked to every other relation by a join clause. > But that *can* happen (remember that clauses generated by transitive > equality do count). So there needs to be some attention paid to both > average and worst case behaviors. > > Raising them to 10 would only impose a worst case 100-fold growth, > which is not as scary as 10000-fold, so maybe we should consider > that as an intermediate step. Don't know how much difference that > would make in the real world though. > > It also occurs to me to wonder if we could adjust the limit on-the-fly > based on noticing whether or not the query is prone to worst-case > behavior, ie how dense is the join connection graph. Right now it'd be > difficult to do that with any reliability, though, because we don't look > for equivalence classes until after we've fixed our attention on a > particular join subproblem. > > regards, tom lane >
Pavel, > Actually we had to solve a issue with slow SELECT. The problem was in > low value of JOIN_COLLAPSE_LIMITS. Can we increase a default of this > value. I checked some complex query, and planner needed about 200ms > for JOIN_COLLAPSE_LIMIT = 16. So some around 12 can be well. I'm not comfortable with increasing the default, yet. While folks on dedicated good hardware can handle a collapse of 10-12joins, a lot of people are running PostgreSQL on VMs these days whose real CPU power is no better than a Pentium IV. Also, if you're doing OLTP queries on small tables, spending 20ms planning a query is unreasonably slow in a way it isnot for a DW query. It does make a reasonable piece of advice for those tuning for DW, though. I'll add it to my list. Speaking of which, what happened to replacing GEQO with Simulated Annealing? Where did that project go? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco
On Apr 30, 2011, at 10:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It also occurs to me to wonder if we could adjust the limit on-the-fly > based on noticing whether or not the query is prone to worst-case > behavior, ie how dense is the join connection graph. I've had this thought - or a similar one - before also. I am not sure how to make it work mechanically but I think it wouldbe tremendous if we could make it work. For most people, my previous naive suggestion (remove the limit entirely) wouldactually work fine, BUT if you hit the problem cases then even a small increase is too much. So I don't really thinkincreasing the limit will eliminate the need for manual fiddling - what we really need to do is come up with a moreaccurate measure of measure of complexity than "number of tables". ...Robert
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Joshua Berkus wrote: > I'm not comfortable with increasing the default, yet. While folks on > dedicated good hardware can handle a collapse of 10-12 joins, a lot > of people are running PostgreSQL on VMs these days whose real CPU > power is no better than a Pentium IV. Really? First, I don't think that's true, the average CPU power is much higher than that. Second, this sounds like the 'ol "tune it for a toaster" trap where we never make improvements to the defaults because someone, somewhere, might *gasp* use Postgres on an underpowered server. > Also, if you're doing OLTP queries on small tables, spending 20ms > planning a query is unreasonably slow in a way it is not for a > DW query. Again, seriously? Do you have numbers to back that up? I could see not going to 16 right away, but who would honestly have a problem with going to 10? I agree with Tom, let's bump this up a little bit and see what happens. My guess is that we won't see a single post in which we advise people to drop it down from 10 to 8. Personally, I'd like to see them go to 12, as that's the best sweet spot I've seen in the field, but I'll take 10 first. :) Tom Lane asked re setting to 10: > Don't know how much difference that would make in the real world though. I've seen a handful of cases that have benefitted from 10, but many more* that benefitted from 12 (*okay, a larger handful anyway, it's not like I have to adjust it too often). - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201105012153 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk2+DqsACgkQvJuQZxSWSshRfQCgzX5JlnCmKTndA7WcF/mt0Kpk b30AoLKrVKMm0rbZNNhgVjt/Xne4NDpj =0deF -----END PGP SIGNATURE-----
On 01/05/11 21:16, Joshua Berkus wrote: > Speaking of which, what happened to replacing GEQO with Simulated Annealing? Where did that project go? It stayed on github (https://github.com/wulczer/saio) and stagnated a bit after I got my degree. It's on the top of my list of things to pick up after the summer (or maybe even during the summer). Cheers, Jan