Обсуждение: Query plan question
Hi, after looking on one of my query I found interesting thing, I was trying to force PG do inner join 2 small tables calcs. necessary aggregates per id and only after that left outer join with the rest. Idea was to calc. all aggregates on small set and do the rest So I was doing something like SELECT .... FROM ( SELECT ( SELECT agr1() .... ), ( SELECT agr2() ... ), .. ) AS pt LEFT OUTER JOIN ..... LEFT OUTER JOIN .... ..... What I see, at least from explain output, PG ignores my desire and calc. aggregate for every row, so in my example 'small' tables # rows 10000 but what I see from plan PG calcs. aggregates for all 10720 rows. What I'm doing wrong, and more interesting question how to force PG do what I want to do, I think I know my data better then PG does ;) P.S Sorry for large post... EXAMPLE: SELECT ........ FROM ( SELECT first, ....., ( SELECT CAST( COUNT(*) AS int4 ) FROM prod.t_pas AS tpa WHERE tpa.kid = p.kid ), ( SELECT CAST( COUNT(*) AS int4 ) FROM prod.t_pinv AS tpi WHERE tpi.kid = p.kid ), ( SELECT CAST( COUNT(*) AS int4 ) FROM prod.t_cit AS ct1 WHERE ct1.kid = p.kid), ( SELECT CAST( COUNT(*) AS int4 ) FROM prod.t_cit AS ct2 WHERE ct2.kid = p.kid ), ( SELECT prod.pgag_list(pncl1.fld) FROM prod.t_pcls AS pncl1 WHERE pncl1.paid = p.kid ), ( SELECT prod.pgag_list(pipc1.fld) FROM prod.t_pics AS pipc1 WHERE pipc1.kid = p.kid ) FROM prod.t_p AS p INNER JOIN t_temp AS t ON p.did = t.did LEFT OUTER JOIN prod.t_pinv AS pi ON p.kid = pi.kid AND pi.orderid = 'S' ) AS pt LEFT OUTER JOIN prod.t_dmp AS pdb ON pt.kid = pdb.kid LEFT OUTER JOIN prod.t_inv AS i ON pt.first = i.invid LEFT OUTER JOIN prod.t_pata AS pa ON pt.kid = pa.kid LEFT OUTER JOIN prod.t_as AS a ON pa.aid = a.aid; Nested Loop (cost=0.00..28286.07 rows=1000 width=183) (actual time=111.80..140707.84 rows=10720 loops=1) -> Nested Loop (cost=0.00..25260.07 rows=1000 width=154) (actual time=50.30..45994.51 rows=10720 loops=1) -> Nested Loop (cost=0.00..19926.91 rows=1000 width=141) (actual time=49.81..42713.54 rows=10000 loops=1) -> Nested Loop (cost=0.00..16900.91 rows=1000 width=115) (actual time=17.41..13394.78 rows=10000 loops=1) -> Nested Loop (cost=0.00..11249.73 rows=1000 width=97) (actual time=2.15..6734.64 rows=10000 loops=1) Join Filter: ("inner".orderid = 'S'::bpchar) -> Nested Loop (cost=0.00..5791.35 rows=1000 width=84) (actual time=1.58..2860.57 rows=10000 loops=1) -> Seq Scan on t_temp t (cost=0.00..20.00 rows=1000 width=20) (actual time=0.06..171.12 rows=10000 loops=1) -> Index Scan using t_pdid on t_pt p (cost=0.00..5.76 rows=1 width=64) (actual time=0.20..0.22 rows=1 loops=10000) Index Cond: (p.did = "outer".did) -> Index Scan using t_pinviid on t_pinv pi (cost=0.00..5.44 rows=1 width=13) (actual time=0.21..0.31 rows=2 loops=10000) Index Cond: ("outer".kid = pi.kid) -> Index Scan using t_dmpid on t_dmp pdb (cost=0.00..5.64 rows=1 width=18) (actual time=0.58..0.60 rows=1 loops=10000) Index Cond: ("outer".kid = pdb.kid) -> Index Scan using t_invid on t_inv i (cost=0.00..3.01 rows=1 width=26) (actual time=2.83..2.85 rows=1 loops=10000) Index Cond: ("outer".iid = i.iid) -> Index Scan using t_pasid on t_pas pa (cost=0.00..5.32 rows=1 width=13) (actual time=0.21..0.23 rows=1 loops=10000) Index Cond: ("outer".kid = pa.kid) -> Index Scan using t_asid on t_as a (cost=0.00..3.01 rows=1 width=29) (actual time=0.64..0.66 rows=1 loops=10720) Index Cond: ("outer".aid = a.aid) SubPlan -> Aggregate (cost=5.32..5.32 rows=1 width=0) (actual time=0.17..0.17 rows=1 loops=10720) -> Index Scan using t_pasid on t_pas tpa (cost=0.00..5.32 rows=1 width=0) (actual time=0.11..0.13 rows=1 loops=10720) Index Cond: (kid = $0) -> Aggregate (cost=5.45..5.45 rows=1 width=0) (actual time=0.25..0.25 rows=1 loops=10720) -> Index Scan using t_pinviid on t_pinv tpi (cost=0.00..5.44 rows=1 width=0) (actual time=0.13..0.21 rows=2 loops=10720) Index Cond: (kid = $0) -> Aggregate (cost=8.72..8.72 rows=1 width=0) (actual time=0.33..0.33 rows=1 loops=10720) -> Index Scan using t_citid on t_cit ct1 (cost=0.00..8.67 rows=19 width=0) (actual time=0.19..0.28 rows=4 loops=10720) Index Cond: (kid = $0) -> Aggregate (cost=78.60..78.60 rows=1 width=0) (actual time=0.33..0.33 rows=1 loops=10720) -> Index Scan using t_cdid on t_cit ct2 (cost=0.00..78.55 rows=19 width=0) (actual time=0.20..0.30 rows=1 loops=10720) Index Cond: (did = $1) -> Aggregate (cost=4.51..4.51 rows=1 width=11) (actual time=0.22..0.22 rows=1 loops=10720) -> Index Scan using t_pclsid on t_pcls pncl1 (cost=0.00..4.51 rows=1 width=11) (actual time=0.13..0.14 rows=0 loops=10720) Index Cond: (kid = $0) -> Aggregate (cost=5.93..5.93 rows=1 width=12) (actual time=0.44..0.44 rows=1 loops=10720) -> Index Scan using t_picsid2 on t_pics pipc1 (cost=0.00..5.93 rows=1 width=12) (actual time=0.20..0.24 rows=2 loops=10720) Index Cond: (kid = $0)
"Maksim Likharev" <mlikharev@aurigin.com> writes: > I was trying to force PG do inner join 2 small tables calcs. necessary > aggregates per id > and only after that left outer join with the rest. AFAICS you're contorting your query to force the evaluation order. Why are you complaining that PG follows what you told it to do? I'm having a hard time figuring out exactly what the query's intent is. What does it look like when you express it in the simplest way possible, with minimum use of subselects? regards, tom lane
Hi Tom, basically I complaining that PG does not do what I told to do or was hoping to do. What I was hopping to do: join temp table and main table + evaluate all aggregates for small subset, in my case subset is not so small 10000 rows and after that join to other tables that will effectively grow resultset size, due to one to many relations. But what I see PG calculates all aggregates for the final resultset in my case 10720, but that could be up to x5 times more. In order to be more clear let's consider following simplification 4 tables DOCS, REVIEWERS, REVISIONS AND OTHER [docid] is primary key in DOCS, DOCS one to many for REVIVERS, REVISIONS, OTHER I want following output: [docid], [reviewrs cnt], [revisions cnt], otherfields... So what could be simpler than SELECT .... FROM( SELECT docid, stuff, .... ( SELECT count(...) FROM REVIEWERS ), ( SELECT count(...) FROM REVISIONS ) FROM DOCS ) AS t LEFT OUTER JOIN OTHER .... I want result set [t] to be evaluated first. I reality I have 4/6 aggregate to calc. and 3/4 tables to join. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, June 25, 2003 7:53 PM To: Maksim Likharev Cc: GENERAL Subject: Re: [GENERAL] Query plan question "Maksim Likharev" <mlikharev@aurigin.com> writes: > I was trying to force PG do inner join 2 small tables calcs. necessary > aggregates per id > and only after that left outer join with the rest. AFAICS you're contorting your query to force the evaluation order. Why are you complaining that PG follows what you told it to do? I'm having a hard time figuring out exactly what the query's intent is. What does it look like when you express it in the simplest way possible, with minimum use of subselects? regards, tom lane
"Maksim Likharev" <mlikharev@aurigin.com> writes: > basically I complaining that PG does not do what I told to do or > was hoping to do. Okay, now I get the point: you want to prevent the "pt" sub-select from being flattened into the outer query. 7.3.1 through 7.3.3 will actually do what you want (they won't flatten a sub-select that has any sub-selects in its output list) but we got a lot of flak for that and 7.4 will go back to the prior behavior. In most scenarios it's a win for the planner to flatten wherever possible. Probably the easiest way to handle it is to insert a DISTINCT or LIMIT clause in the sub-select; that will unconditionally keep the planner from flattening the sub-select. For example, ... FROM prod.t_p AS p INNER JOIN t_temp AS t ON p.did = t.did LEFT OUTER JOIN prod.t_pinv AS pi ON p.kid = pi.kid AND pi.orderid = 'S' -- hack to keep this separate from outer plan: OFFSET 0 ) AS pt LEFT OUTER JOIN prod.t_dmp AS pdb ON pt.kid = pdb.kid ... I don't foresee any future planner changes that would be likely to bypass a LIMIT/OFFSET clause. regards, tom lane
Thanks Tom, works, have to test performance.... -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Thursday, June 26, 2003 7:36 AM To: Maksim Likharev Cc: GENERAL Subject: Re: [GENERAL] Query plan question "Maksim Likharev" <mlikharev@aurigin.com> writes: > basically I complaining that PG does not do what I told to do or > was hoping to do. Okay, now I get the point: you want to prevent the "pt" sub-select from being flattened into the outer query. 7.3.1 through 7.3.3 will actually do what you want (they won't flatten a sub-select that has any sub-selects in its output list) but we got a lot of flak for that and 7.4 will go back to the prior behavior. In most scenarios it's a win for the planner to flatten wherever possible. Probably the easiest way to handle it is to insert a DISTINCT or LIMIT clause in the sub-select; that will unconditionally keep the planner from flattening the sub-select. For example, ... FROM prod.t_p AS p INNER JOIN t_temp AS t ON p.did = t.did LEFT OUTER JOIN prod.t_pinv AS pi ON p.kid = pi.kid AND pi.orderid = 'S' -- hack to keep this separate from outer plan: OFFSET 0 ) AS pt LEFT OUTER JOIN prod.t_dmp AS pdb ON pt.kid = pdb.kid ... I don't foresee any future planner changes that would be likely to bypass a LIMIT/OFFSET clause. regards, tom lane