Обсуждение: [HACKERS] Regression in join selectivity estimations when using foreign keys
I've been analyzing a reported regression case between a 9.5 plan and a 9.6 plan. I tracked this down to the foreign key join selectivity code, specifically the use_smallest_selectivity code which is applied to outer joins where the referenced table is on the outer side of the join. A vastly simplified example case is: create table fkest (a int, b int, c int unique, primary key(a,b)); create table fkest1 (a int, b int, primary key(a,b)); insert into fkest select x/10,x%10, x from generate_Series(1,400) x; insert into fkest1 select x/10,x%10 from generate_Series(1,400) x; alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b) references fkest; analyze fkest; analyze fkest1; explain (costs on) select * from fkest f left join fkest1 f1 on f.a = f1.a and f.b = f1.b left join fkest1 f2 on f.a = f2.a and f.b = f2.b left join fkest1 f3 on f.a = f3.a and f.b = f3.b where f.c = 1; QUERY PLAN ---------------------------------------------------------------------------------------------------- Hash Left Join (cost=24.15..41.89 rows=996 width=36) Hash Cond: ((f.a = f3.a) AND (f.b = f3.b)) -> Hash Left Join (cost=12.15..28.36 rows=100 width=28) Hash Cond: ((f.a = f2.a) AND (f.b = f2.b)) -> Nested Loop Left Join (cost=0.15..16.21 rows=10 width=20) -> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12) Filter: (c = 1) -> Index Only Scan using fkest1_pkey on fkest1 f1 (cost=0.15..8.17 rows=1 width=8) Index Cond: ((a = f.a) AND (b = f.b)) -> Hash (cost=6.00..6.00 rows=400 width=8) -> Seq Scan on fkest1 f2 (cost=0.00..6.00 rows=400 width=8) -> Hash (cost=6.00..6.00 rows=400 width=8) -> Seq Scan on fkest1 f3 (cost=0.00..6.00 rows=400 width=8) (13 rows) -- now drop the foreign key to simulate the behaviour pre-9.6 alter table fkest1 drop constraint fkest1_a_b_fkey; explain (costs on) select * from fkest f left join fkest1 f1 on f.a = f1.a and f.b = f1.b left join fkest1 f2 on f.a = f2.a and f.b = f2.b left join fkest1 f3 on f.a = f3.a and f.b = f3.b where f.c = 1; QUERY PLAN ---------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.44..32.62 rows=1 width=36) -> Nested Loop Left Join (cost=0.29..24.41 rows=1 width=28) -> Nested Loop Left Join (cost=0.15..16.21 rows=1 width=20) -> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12) Filter: (c = 1) -> Index Only Scan using fkest1_pkey on fkest1 f1 (cost=0.15..8.17 rows=1 width=8) Index Cond: ((a = f.a) AND (b = f.b)) -> Index Only Scan using fkest1_pkey on fkest1 f2 (cost=0.15..8.17 rows=1 width=8) Index Cond: ((a = f.a) AND (b = f.b)) -> Index Only Scan using fkest1_pkey on fkest1 f3 (cost=0.15..8.17 rows=1 width=8) Index Cond: ((a = f.a) AND (b = f.b)) (11 rows) The problem is that the use_smallest_selectivity in get_foreign_key_join_selectivity() here is calculating 1/10 instead of 1/100 for each join level. Really it should be multiplying the selectivities instead of taking the minimum selectivity, but even that does not seem correct. That bad estimate propagates up the join tree until some pretty insane estimates come back. The following is the offending code: foreach(cell, removedlist) { RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell); Selectivity csel; csel = clause_selectivity(root, (Node *) rinfo, 0, jointype, sjinfo); thisfksel = Min(thisfksel, csel); } This code is only applied to outer joins. I've studied this a bit and generally, I'm unsure why we're treating outer joins in a special way at all. Some comments indicate that this is in regards to NULLs, but I'm not sure why we've code special code for outer joins and none for INNER joins. It's quite easy to show how this regresses in general in regards to NULLs on INNER joins with: create table part (partid int primary key); create table sale (saleid int primary key, partid int references part); insert into part select generate_series(1,1000); insert into sale select x,NULL from generate_series(1,1000) x; analyze part; analyze sale; explain analyze select * from part p inner join sale s on p.partid = s.partid; -- using foreign key join estimations QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=27.50..55.12 rows=1000 width=12) (actual time=0.733..0.733 rows=0 loops=1) Hash Cond: (s.partid = p.partid) -> Seq Scan on sale s (cost=0.00..15.00 rows=1000 width=8) (actual time=0.018..0.332 rows=1000 loops=1) -> Hash (cost=15.00..15.00 rows=1000 width=4) (actual time=0.336..0.336 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 44kB -> Seq Scan on part p (cost=0.00..15.00 rows=1000 width=4) (actual time=0.014..0.132 rows=1000 loops=1) Planning time: 0.468 ms Execution time: 0.787 ms (8 rows) alter table sale drop constraint sale_partid_fkey; -- simulate pre-9.6 behaviour by dropping the fkey explain analyze select * from part p inner join sale s on p.partid = s.partid; -- normal join estimations. QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=27.50..55.12 rows=1 width=12) (actual time=0.546..0.546 rows=0 loops=1) Hash Cond: (s.partid = p.partid) -> Seq Scan on sale s (cost=0.00..15.00 rows=1000 width=8) (actual time=0.014..0.102 rows=1000 loops=1) -> Hash (cost=15.00..15.00 rows=1000 width=4) (actual time=0.387..0.387 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 44kB -> Seq Scan on part p (cost=0.00..15.00 rows=1000 width=4) (actual time=0.009..0.110 rows=1000 loops=1) Planning time: 0.278 ms Execution time: 0.572 ms I've attached fixes, based on master, for both of these cases. Patches: 0001: Is a minimal fix for the repored regression, but may cause further regression as it does nothing to account for NULLs valued columns in outer joins, but at least it is consistent with how INNER joins are estimated regarding NULLs. 0002: Adds further code to apply the nullfrac from pg_statistics. I've given this a round of testing and I can't find any case where it gives worse estimates than the standard join estimation as seen when you drop the foreign key, but I'd like to challenge someone else to give it a go and see if they can.... Not because I'm super confident, more just because I want this to be correct. Thoughts? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Re: [HACKERS] Regression in join selectivity estimations when using foreign keys
От
David Rowley
Дата:
On 18 May 2017 at 20:28, David Rowley <david.rowley@2ndquadrant.com> wrote: > A vastly simplified example case is: > > create table fkest (a int, b int, c int unique, primary key(a,b)); > create table fkest1 (a int, b int, primary key(a,b)); > > insert into fkest select x/10,x%10, x from generate_Series(1,400) x; > insert into fkest1 select x/10,x%10 from generate_Series(1,400) x; > > alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b) > references fkest; > > analyze fkest; > analyze fkest1; > > explain (costs on) select * from fkest f > left join fkest1 f1 on f.a = f1.a and f.b = f1.b > left join fkest1 f2 on f.a = f2.a and f.b = f2.b > left join fkest1 f3 on f.a = f3.a and f.b = f3.b > where f.c = 1; I should have shown the EXPLAIN ANALYZE of this instead. QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------Hash LeftJoin (cost=24.15..41.89 rows=996 width=36) (actual time=0.430..0.463 rows=1 loops=1) Hash Cond: ((f.a = f3.a) AND (f.b = f3.b)) -> Hash Left Join (cost=12.15..28.36 rows=100width=28) (actual time=0.255..0.288 rows=1 loops=1) Hash Cond: ((f.a = f2.a) AND (f.b = f2.b)) -> Nested Loop Left Join (cost=0.15..16.21rows=10 width=20) (actual time=0.046..0.079 rows=1 loops=1) -> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12) (actual time=0.013..0.045 rows=1 loops=1) Filter: (c = 1) Rows Removed byFilter: 399 -> Index Only Scan using fkest1_pkey on fkest1 f1 (cost=0.15..8.17 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1) Index Cond: ((a = f.a) AND (b = f.b)) Heap Fetches: 1 -> Hash (cost=6.00..6.00rows=400 width=8) (actual time=0.180..0.180 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scanon fkest1 f2 (cost=0.00..6.00 rows=400 width=8) (actual time=0.006..0.041 rows=400 loops=1) -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.162..0.162 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f3 (cost=0.00..6.00 rows=400 width=8) (actual time=0.010..0.040 rows=400 loops=1)Planning time: 0.409 msExecution time: 0.513 ms (19 rows) which you can obviously see the poor estimate propagating to up the plan tree. If we add another left join the final estimate is even worse: explain analyze select * from fkest f left join fkest1 f1 on f.a = f1.a and f.b = f1.b left join fkest1 f2 on f.a = f2.a and f.b = f2.b left join fkest1 f3 on f.a = f3.a and f.b = f3.b left join fkest1 f4 on f.a = f4.a and f.b = f4.b where f.c = 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------Hash LeftJoin (cost=36.15..69.06 rows=9915 width=44) (actual time=0.535..0.569 rows=1 loops=1) Hash Cond: ((f.a = f4.a) AND (f.b = f4.b)) -> Hash Left Join (cost=24.15..41.89 rows=996width=36) (actual time=0.371..0.404 rows=1 loops=1) Hash Cond: ((f.a = f3.a) AND (f.b = f3.b)) -> Hash Left Join (cost=12.15..28.36rows=100 width=28) (actual time=0.208..0.241 rows=1 loops=1) Hash Cond: ((f.a = f2.a) AND (f.b = f2.b)) -> NestedLoop Left Join (cost=0.15..16.21 rows=10 width=20) (actual time=0.029..0.062 rows=1 loops=1) -> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12) (actual time=0.014..0.047 rows=1 loops=1) Filter: (c = 1) RowsRemoved by Filter: 399 -> Index Only Scan using fkest1_pkey on fkest1 f1 (cost=0.15..8.17 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: ((a = f.a) AND (b = f.b)) Heap Fetches: 1 -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.168..0.168 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f2 (cost=0.00..6.00 rows=400 width=8) (actual time=0.008..0.043 rows=400 loops=1) -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.156..0.156 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scanon fkest1 f3 (cost=0.00..6.00 rows=400 width=8) (actual time=0.006..0.035 rows=400 loops=1) -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.155..0.155 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f4 (cost=0.00..6.00 rows=400 width=8) (actual time=0.004..0.034 rows=400 loops=1)Planning time: 0.864 msExecution time: 0.698 ms -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > I've been analyzing a reported regression case between a 9.5 plan and > a 9.6 plan. I tracked this down to the foreign key join selectivity > code, specifically the use_smallest_selectivity code which is applied > to outer joins where the referenced table is on the outer side of the > join. > ... > I've attached fixes, based on master, for both of these cases. I'm entirely unconvinced by this patch --- it seems to simply be throwing away a lot of logic. Notably it lobotomizes the FK code altogether for semi/antijoin cases, but you've not shown any example that even involves such joins, so what's the argument for doing that? Also, the reason we had the use_smallest_selectivity code in the first place was that we didn't believe the FK-based selectivity could be applied safely to outer-join cases, so simply deciding that it's OK to apply it after all seems insufficiently justified. Or in short, exhibiting one counterexample to the existing code is not a sufficient argument for changing things this much. You need to give an argument why this is the right thing to do instead. Stepping back a bit, it seems like the core thing the FK selectivity code was meant to do was to prevent us from underestimating selectivity of multiple-clause join conditions through a false assumption of clause independence. The problem with the use_smallest_selectivity code is that it's overestimating selectivity, but that doesn't mean that we want to go all the way back to the old way of doing things. I wonder if we could get any traction in these dubious cases by computing the product, instead of minimum, of the clause selectivities (thus matching the old estimate) and then taking the greater of that and the FK-based selectivity. regards, tom lane
Re: [HACKERS] Regression in join selectivity estimations when usingforeign keys
От
David Rowley
Дата:
On 21 May 2017 at 07:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm entirely unconvinced by this patch --- it seems to simply be throwing > away a lot of logic. Notably it lobotomizes the FK code altogether for > semi/antijoin cases, but you've not shown any example that even involves > such joins, so what's the argument for doing that? Also, the reason > we had the use_smallest_selectivity code in the first place was that we > didn't believe the FK-based selectivity could be applied safely to > outer-join cases, so simply deciding that it's OK to apply it after all > seems insufficiently justified. Originally I looked at just multiplying the selectivities in use_smallest_selectivity, but on further thought, I didn't manage to see why using foreign keys to assist with selectivity estimations when the ref_is_outer is true. We still have a very high probability that the outer rel contains a tuple to match each inner rel's tuples (because inner references outer). The only difference between OUTER and INNER typed joins is that OUTER will produce a bunch of NULL rows in place of non-matched inner rows. That part seems to be handled in calc_joinrel_size_estimate() by code such as: if (nrows < outer_rows) nrows = outer_rows; We could do much better than what we have today by also adding outer_rows - (n_distinct inner rows on referencing Vars), to also account for those unmatched rows. However, I'd say that's not for back-patching, although it may be especially good now that we have multi-variate ndistinct in PG10. > Or in short, exhibiting one counterexample to the existing code is not > a sufficient argument for changing things this much. You need to give > an argument why this is the right thing to do instead. > > Stepping back a bit, it seems like the core thing the FK selectivity code > was meant to do was to prevent us from underestimating selectivity of > multiple-clause join conditions through a false assumption of clause > independence. The problem with the use_smallest_selectivity code is that > it's overestimating selectivity, but that doesn't mean that we want to go > all the way back to the old way of doing things. I wonder if we could get > any traction in these dubious cases by computing the product, instead of > minimum, of the clause selectivities (thus matching the old estimate) and > then taking the greater of that and the FK-based selectivity. My concern with going back to selectivity multiplication was exactly because I didn't want to go back to the original undesired behaviour of underestimation when conditions are co-related. I'm unsure why taking the greater is any better than just using the foreign key selectivity. It seems senseless to use clause based selectivities at all when we've proved the foreign key exists -- there will be no unmatched inner tuples. The reason I dropped the non-singleton join for SEMI and ANTI is that I couldn't see how we could make any improvements over the original join estimation code for this case. Perhaps I'm assuming too much about how get_foreign_key_join_selectivity() is used by doing this? I'm assuming that leaving the clause list intact and referring the whole case to calc_joinrel_size_estimate() is okay to do. The reason I added the nullfrac code in 0002 was because I was concerned with regressing OUTER join cases where the nulfrac works due to using the clause_selectivity(). Although this case regressed between 9.5 and 9.6 for INNER joins with a non-zero nullfrac on the join condition. In short; if we replace the use_smallest_selectivity code with fkselec *= clauselist_selectivity(root, removedlist, 0, jointype, sjinfo); then I'm worried about regressing back to the old underestimations. The extended stats code in PG10's version of clauselist_selectivity() will ignore these join conditions, so nothing can be done to assist the underestmations by adding extended stats yet. I also just noticed that I don't think I've got ANTI join cases correct in the patch I sent. I'll look at that now. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Regression in join selectivity estimations when usingforeign keys
От
David Rowley
Дата:
On 22 May 2017 at 16:10, David Rowley <david.rowley@2ndquadrant.com> wrote: > I also just noticed that I don't think I've got ANTI join cases > correct in the patch I sent. I'll look at that now. I've attached an updated patch. This one is much less invasive than my original attempt. There are two fundamental changes here: 1) OUTER joins now use the foreign key as proof that the join condition must match. 2) selectivity of nullfrac for null valued columns for OUTER joins is no longer taken into account. This is now consistent with INNER joins, which might not be perfect, but it's less surprising. If this is a problem then we can consider applying something like my 0002 patch above, however that can mean that nulls are double counted if there are any other strict clauses which are not part of the foreign key constraint, so that idea is not perfect either. In addition to those two things, the poor selectivity estimation in my original email is also fixed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > On 22 May 2017 at 16:10, David Rowley <david.rowley@2ndquadrant.com> wrote: >> I also just noticed that I don't think I've got ANTI join cases >> correct in the patch I sent. I'll look at that now. > I've attached an updated patch. > This one is much less invasive than my original attempt. Sorry for not dealing with this sooner --- I put it on the back burner for PGCon, and just now got back to it. First off, I agree that it was probably a mistake for this code to special-case LEFT/FULL joins at all. eqjoinsel() does not do so; it relies on the join-type correction applied later by calc_joinrel_size_estimate(). Since that correction is also downstream of this code, we should be able to do the same here, and indeed may be double-counting somehow if we don't. What that leaves is that we're using the "smallest per-column selectivity" hack only for SEMI/ANTI joins where we can't really get anything helpful from knowledge of the FK. What your patch does is to fall back on the traditional clauselist_selectivity calculation for the relevant clauses. But we'd be better off ignoring the FK altogether and leaving those clauses to be processed later. That saves some cycles, and it might allow those clauses to be used more fruitfully with a different FK, and even if that doesn't happen it's better to let clauselist_selectivity see as many clauses at once as possible. So I whacked the patch around to do it like that and pushed it. I'm not totally satisfied that there isn't any case where the smallest selectivity hack is appropriate. In the example you're showing here, the FK columns are independent so that we get more or less the right answer with or without the FK. But in some quick testing I could not produce a counterexample proving that that heuristic is helpful; so for now let's can it. Thanks, and sorry again for the delay. regards, tom lane
Re: [HACKERS] Regression in join selectivity estimations when usingforeign keys
От
David Rowley
Дата:
On 20 June 2017 at 07:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not totally satisfied that there isn't any case where the smallest > selectivity hack is appropriate. In the example you're showing here, > the FK columns are independent so that we get more or less the right > answer with or without the FK. But in some quick testing I could not > produce a counterexample proving that that heuristic is helpful; > so for now let's can it. > > Thanks, and sorry again for the delay. Many thanks for taking the time on this. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services