Обсуждение: BUG #6359: excessively inlining subquery leads to slow queries
The following bug has been logged on the website: Bug reference: 6359 Logged by: Maksym Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 9.1.2 Operating system: Ubuntu linux Description:=20=20=20=20=20=20=20=20 Sometime Postgres inline subrequest even if it produce slower plan (and that slow plan have higher actual cost than non-inlined plan): test case: drop table if exists t1; drop table if exists t2; create table t1 as select id from generate_series(1,1) as g(id); create table t2 as select id from generate_series(1,1000) as g(id); alter table t1 add primary key (id); alter table t2 add primary key (id); analyze t1; analyze t2; --fast non-inlined plan explain (verbose, analyze) select id, =20=20=20 t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_= id+t2_id+t2_id+t2_id+t2_id FROM ( select t1.id, (select t2.id from t2 where t2.id=3Dt1.id) as t2_id from t1 offset 0 ) as t; =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20 QUERY PLAN ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ------------------------------------------------------------------- Subquery Scan on t (cost=3D0.00..0.65 rows=3D1 width=3D8) (actual time=3D0.066..0.069 rows=3D1 loops=3D1) Output: t.id, ((((((((((((((((t.t2_id + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) -> Limit (cost=3D0.00..0.60 rows=3D1 width=3D4) (actual time=3D0.053..= 0.056 rows=3D1 loops=3D1) Output: t1.id, ((SubPlan 1)) -> Seq Scan on public.t1 (cost=3D0.00..0.60 rows=3D1 width=3D4) = (actual time=3D0.052..0.053 rows=3D1 loops=3D1) Output: t1.id, (SubPlan 1) SubPlan 1 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0= .49 rows=3D1 width=3D4) (actual time=3D0.025..0.028 rows=3D1 loops=3D1) Output: t2.id Index Cond: (t2.id =3D t1.id) Total runtime: 0.161 ms (11 rows) --slow inlined plan explain (verbose, analyze) select id, =20=20=20 t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_= id+t2_id+t2_id+t2_id+t2_id FROM ( select t1.id, (select t2.id from t2 where t2.id=3Dt1.id) as t2_id from t1 -- offset 0 ) as t; =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20 QUERY PLAN= =20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------------------------------------- Seq Scan on public.t1 (cost=3D0.00..8.44 rows=3D1 width=3D4) (actual time=3D0.180..0.181 rows=3D1 loops=3D1) Output: t1.id, (((((((((((((((((SubPlan 1) + (SubPlan 2)) + (SubPlan 3)) + (SubPlan 4)) + (SubPlan 5)) + (SubPlan 6)) + (SubPlan 7)) + (SubPlan 8)) + (SubPlan 9)) + (SubPlan 10)) + (SubPlan 11)) + (SubPlan 12)) + (SubPlan 13)) + (SubPlan 14)) + (SubPlan 15)) + (SubPlan 16)) + (SubPlan 17)) SubPlan 1 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.025..0.028 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 2 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.006..0.007 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 3 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 4 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 5 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.004 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 6 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.004 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 7 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.004 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 8 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 9 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 10 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 11 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 12 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 13 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 14 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 15 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 16 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) SubPlan 17 -> Index Scan using t2_pkey on public.t2 (cost=3D0.00..0.49 rows=3D1 width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1) Output: public.t2.id Index Cond: (public.t2.id =3D t1.id) Total runtime: 0.466 ms (71 rows) The inlined plan uses 3x more time and have 10x higher cost. I found that problem in much more longer analytical query where subrequest is slow and complicated.
On Mon, Dec 26, 2011 at 8:50 AM, <maxim.boguk@gmail.com> wrote: > The following bug has been logged on the website: > > Bug reference: =A0 =A0 =A06359 > Logged by: =A0 =A0 =A0 =A0 =A0Maksym Boguk > Email address: =A0 =A0 =A0maxim.boguk@gmail.com > PostgreSQL version: 9.1.2 > Operating system: =A0 Ubuntu linux > Description: > > Sometime Postgres inline subrequest even if it produce slower plan (and t= hat > slow plan have higher actual cost than non-inlined plan): I'm pretty sure this has been discussed before. In the case where a non-trivial expression is going to get inlined in multiple places, it would probably be better to replace it with a placeholder referencing a subplan that evaluates it just once. But, we don't currently try to do that. In practice, this only bites us occasionally, because it's actually fairly rare to write a query that references the same subquery output twice, let alone 17 times. Also, detecting the scenario where we shouldn't inline probably isn't free, so we'd have to weigh the benefit of helping some people quite a bit against the cost of slowing down query planning slightly for everyone else. All in all, it's a complicated problem... so I'd classify this more as an unimplemented feature than a bug. That having been said, I hope we'll get around to improving it at some point. Even though in most cases inlining is a huge win, there are certainly boundary cases where it's not, and this is one of them. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company