Re: BUG #13614: Optimizer choosing
От | Matthew Bellew |
---|---|
Тема | Re: BUG #13614: Optimizer choosing |
Дата | |
Msg-id | loom.20151014T055051-633@post.gmane.org обсуждение исходный текст |
Ответ на | BUG #13614: Optimizer choosing "Nested Loop" due to bad estimate of "rows=1" (matthewb@labkey.com) |
Список | pgsql-bugs |
This issue has been affecting a lot of our customers lately, (perhaps it has become even more common to use nested loop joins in 9.4.5?). I have created a repro to show the problem. Given two table A joined to B and a filter of say selectivity Asel% on A and Bsel% on B. It is very optimistic to assume that the selectivity of the rows in A join B is Asel * Bsel. But this seems to be what Postgres assumes. A very conservative estimate would be that the join selectivity is min(Asel,Bsel). The conservative estimate is probably not ideal, but is less likely to lead to a pathological plan (one that is orders of magnitude worse than if no filters had been applied at all), than using the very optimistic estimate. Just guessing that the geometric mean of the optimistic and pessimistic estimate. sqrt(min(Asel,Bsel) * (Asel*Bsel)) might be close to a good compromise. Regardless, or the exact estimation it seems clear that a) the current estimate for joins over filtered tables it way too optmistic b) the estimate should not be allowed to float down to 1, 1 should only be used to mean exactly 1, any other estimate should be capped at some sort of minimum to help avoid choosing joins that degrade badly if the estimate is wrong. SAMPLE 1 - no UNIQUE indexes, on my machine despite estimates, the optimizer chooses MERGE JOINS and results are very fast. DROP TABLE IF EXISTS A; DROP TABLE IF EXISTS B; DROP TABLE IF EXISTS C; DROP TABLE IF EXISTS D; DROP TABLE IF EXISTS E; SELECT * INTO A FROM ( SELECT seq, 100000-seq as rev, (seq+111) % 100000 as rot, rand, rand%10 as ten, rand%100 as hundred, rand%1000 as thousand, md5(random()::text) AS descr FROM (SELECT generate_series(1,100000) AS seq, cast(floor(random()*10000) as int4) as rand) _ ) _; --CREATE UNIQUE INDEX ON A (seq); --CREATE UNIQUE INDEX ON A (rev); --CREATE UNIQUE INDEX ON A (rot); CREATE INDEX ON A (hundred); SELECT * INTO B FROM A; --CREATE UNIQUE INDEX ON B (seq); --CREATE UNIQUE INDEX ON B (rev); --CREATE UNIQUE INDEX ON B (rot); CREATE INDEX ON B (hundred); SELECT * INTO C FROM A; --CREATE UNIQUE INDEX ON C (seq); --CREATE UNIQUE INDEX ON C (rev); --CREATE UNIQUE INDEX ON C (rot); CREATE INDEX ON C (hundred); SELECT * INTO D FROM A; --CREATE UNIQUE INDEX ON D (seq); --CREATE UNIQUE INDEX ON D (rev); --CREATE UNIQUE INDEX ON D (rot); CREATE INDEX ON D (hundred); SELECT * INTO E FROM A; --CREATE UNIQUE INDEX ON E (seq); --CREATE UNIQUE INDEX ON E (rev); --CREATE UNIQUE INDEX ON E (rot); CREATE INDEX ON E (hundred); SELECT _a.seq, _b.rand, _c.ten, _d.hundred, _e.descr FROM (SELECT * FROM A WHERE hundred in (1,2,3,4,5)) _a JOIN (SELECT * FROM B WHERE hundred in (1,2,3,4,5)) _b ON _a.seq = _b.seq JOIN (SELECT * FROM C WHERE hundred in (1,2,3,4,5)) _c ON _b.rev = _c.rev JOIN (SELECT * FROM D WHERE hundred in (1,2,3,4,5)) _d ON _c.seq = _d.seq JOIN (SELECT * FROM E WHERE hundred in (1,2,3,4,5)) _e ON _d.rot = _e.rot SAMPLE 2- with UNIQUE indexes, on my machine the optimizer starts using NESTED LOOPS once the row estimate gets to 1. This case isn't tragically bad (worse than SAMPLE 1), but we hit pathological cases in the wild that are exceedingly hard to avoid. DROP TABLE IF EXISTS A; DROP TABLE IF EXISTS B; DROP TABLE IF EXISTS C; DROP TABLE IF EXISTS D; DROP TABLE IF EXISTS E; SELECT * INTO A FROM ( SELECT seq, 100000-seq as rev, (seq+111) % 100000 as rot, rand, rand%10 as ten, rand%100 as hundred, rand%1000 as thousand, md5(random()::text) AS descr FROM (SELECT generate_series(1,100000) AS seq, cast(floor(random()*10000) as int4) as rand) _ ) _; CREATE UNIQUE INDEX ON A (seq); CREATE UNIQUE INDEX ON A (rev); CREATE UNIQUE INDEX ON A (rot); CREATE INDEX ON A (hundred); SELECT * INTO B FROM A; CREATE UNIQUE INDEX ON B (seq); CREATE UNIQUE INDEX ON B (rev); CREATE UNIQUE INDEX ON B (rot); CREATE INDEX ON B (hundred); SELECT * INTO C FROM A; CREATE UNIQUE INDEX ON C (seq); CREATE UNIQUE INDEX ON C (rev); CREATE UNIQUE INDEX ON C (rot); CREATE INDEX ON C (hundred); SELECT * INTO D FROM A; CREATE UNIQUE INDEX ON D (seq); CREATE UNIQUE INDEX ON D (rev); CREATE UNIQUE INDEX ON D (rot); CREATE INDEX ON D (hundred); SELECT * INTO E FROM A; CREATE UNIQUE INDEX ON E (seq); CREATE UNIQUE INDEX ON E (rev); CREATE UNIQUE INDEX ON E (rot); CREATE INDEX ON E (hundred); SELECT _a.seq, _b.rand, _c.ten, _d.hundred, _e.descr FROM (SELECT * FROM A WHERE hundred in (1,2,3,4,5)) _a JOIN (SELECT * FROM B WHERE hundred in (1,2,3,4,5)) _b ON _a.seq = _b.seq JOIN (SELECT * FROM C WHERE hundred in (1,2,3,4,5)) _c ON _b.rev = _c.rev JOIN (SELECT * FROM D WHERE hundred in (1,2,3,4,5)) _d ON _c.seq = _d.seq JOIN (SELECT * FROM E WHERE hundred in (1,2,3,4,5)) _e ON _d.rot = _e.rot
В списке pgsql-bugs по дате отправления: