Обсуждение: Performance of query (fwd)
Hmmm, it's been a couple of days, and this still hasn't appeared on the mailing list as far as I can tell. This is a resend. ---------- Forwarded message ---------- Date: Mon, 9 Jun 2003 00:23:38 -0400 (EDT) From: Edmund Dengler <edmundd@eSentire.com> To: pgsql-general@postgresql.org Subject: Performance of query Ok, now I am really confused. I have a query that runs really slow, yet when I code it up as a function it runs very fast! The optimizer seems to not evaluate a good method for performing the query. Background: This is a database based off of Snort. There is a base table called <event> (with <sid>=sensor ID, <cid>=event ID, <timestamp>=when event collected, <signature>=which event was recognized), and a number of depending tables for various bits and pieces of network packet data (<iphdr>=IP header info, <tcphdr>=TCP header info, <data>=packet payload) and <signature> (what event). Notes: Running 7.3.3 Database is around 20GB on disk. => select version(); version --------------------------------------------------------------------- PostgreSQL 7.3.3 on i386-unknown-openbsd3.0, compiled by GCC 2.95.3 (1 row) => \d event Table "public.event" Column | Type | Modifiers -----------+--------------------------+----------- sid | integer | not null cid | bigint | not null signature | integer | not null timestamp | timestamp with time zone | not null Indexes: event_pkey primary key btree (sid, cid), event_pull_idx unique btree ("timestamp", sid, cid), signature_idx btree (signature), timestamp_idx btree ("timestamp") Triggers: event_summary => \d signature Table "public.signature" Column | Type | Modifiers --------------+---------+---------------------------------------------------------- sig_id | integer | not null default nextval('"signature_sig_id_seq"'::text) <<...more columns...>> Indexes: signature_pkey primary key btree (sig_id), sig_class_id btree (sig_class_id), sig_name_idx btree (sig_name) Rest of the tables are primary keyed with (sid, cid). For example, => \d iphdr Table "public.iphdr" Column | Type | Modifiers ----------+----------+----------- sid | integer | not null cid | bigint | not null <<...more columns...>> Indexes: iphdr_pkey primary key btree (sid, cid), ip_dst_idx btree (ip_dst), ip_src_idx btree (ip_src) Triggers: iphdr_summary If I run the query: explain analyze select event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst, icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq, udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum, tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off, tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp, sensor.hostname, sensor.interface, data.data_payload from ( select * from event where timestamp > (select now() - '2 hours'::interval) and exists (select 1 from hack_pull_sid where sid = event.sid) ) as event left join signature on signature.sig_id = event.signature left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid left join sensor on sensor.sid = event.sid left join data on data.sid = event.sid and data.cid = event.cid order by timestamp desc, event.sid desc, event.cid desc ; (note: <hack_pull_sid> is a table of SIDs I am interested in so that I avoid the issues with IN) I get the following output: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1712181.78..1712776.52 rows=237893 width=853) (actual time=163484.81..163484.97 rows=129 loops=1) Sort Key: public.event."timestamp", public.event.sid, public.event.cid InitPlan -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.04..0.04 rows=1 loops=1) -> Merge Join (cost=1471148.73..1551631.57 rows=237893 width=853) (actual time=152693.86..163484.26 rows=129 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=728456.82..802284.20 rows=237893 width=258) (actual time=33652.33..39050.11 rows=129 loops=1) Merge Cond: ("outer".sid = "inner".sid) -> Merge Join (cost=728454.78..798713.57 rows=237893 width=226) (actual time=33635.79..39031.06 rows=129loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=728454.78..736750.59 rows=237893 width=172) (actual time=22767.67..24071.84rows=129loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=728454.78..731219.48 rows=237893 width=144) (actual time=22164.44..22681.23rows=129 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Sort (cost=701134.84..701729.57 rows=237893 width=116) (actual time=15343.87..15344.05rows=129 loops=1) Sort Key: public.event.sid, public.event.cid -> Merge Join (cost=590896.58..671684.37 rows=237893 width=116) (actual time=10857.65..15343.57rows=129 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Index Scan using iphdr_pkey on iphdr (cost=0.00..73422.89 rows=1432042width=28) (actual time=23.19..13810.58 rows=687918 loops=1) -> Sort (cost=590896.58..591491.31 rows=237893 width=88) (actual time=101.79..101.95rows=129 loops=1) Sort Key: public.event.sid, public.event.cid -> Hash Join (cost=185.07..563108.55 rows=237893 width=88) (actual time=97.42..101.39rows=129 loops=1) Hash Cond: ("outer".signature = "inner".sig_id) -> Index Scan using timestamp_idx on event (cost=0.00..558165.62rows=237893 width=24) (actual time=0.18..3.05 rows=129 loops=1) Index Cond: ("timestamp" > $0) Filter: (subplan) SubPlan -> Seq Scan on hack_pull_sid (cost=0.00..1.15 rows=1 width=0)(actual time=0.01..0.01 rows=1 loops=171) Filter: (sid = $1) -> Hash (cost=167.26..167.26 rows=7126 width=64) (actual time=97.12..97.12rows=0 loops=1) -> Seq Scan on signature (cost=0.00..167.26 rows=7126width=64)(actual time=0.01..80.86 rows=7126 loops=1) -> Sort (cost=27319.94..27897.87 rows=231171 width=28) (actual time=6606.52..7073.38 rows=110717loops=1) Sort Key: icmphdr.sid, icmphdr.cid -> Seq Scan on icmphdr (cost=0.00..3784.71 rows=231171 width=28) (actual time=19.81..2082.01rows=231303 loops=1) -> Index Scan using udphdr_pkey on udphdr (cost=0.00..4411.44 rows=115228 width=28) (actualtime=60.23..1269.61 rows=54669 loops=1) -> Index Scan using tcphdr_pkey on tcphdr (cost=0.00..56586.06 rows=1139141 width=54) (actual time=88.72..13878.52rows=484419 loops=1) -> Sort (cost=2.04..2.11 rows=30 width=32) (actual time=16.46..16.64 rows=136 loops=1) Sort Key: sensor.sid -> Seq Scan on sensor (cost=0.00..1.30 rows=30 width=32) (actual time=16.26..16.34 rows=30 loops=1) -> Sort (cost=742691.92..745059.80 rows=947154 width=595) (actual time=108643.60..123322.67 rows=417145 loops=1) Sort Key: data.sid, data.cid -> Seq Scan on data (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.67..54494.61 rows=947324loops=1) Total runtime: 164147.78 msec (43 rows) For some runs, I was getting over 800,000 msec!! Now, this seems very strange for only 129 rows. The inner select returns exactly the 129 rows I am interested in, and now we only need to do a few left joins to get the corresponding rows (which may or may not exist, which is why the LEFT JOIN's). Note that the (<sid>,<cid>) is the primary key for most of these joins, and so the LEFT JOIN should be able to recognize that there will only be one or zero rows matching. In addition, (<sid>,<cid>) is the primary key for the <event> table. Similar discussion with <signature> as <sig_id> is it's primary key. Notes: - The table <event> has 1,427,411 rows - <event>.<timestamp> ranges from 2002-01-19 12:28:29-05 to 2003-06-08 23:40:01-04 - The last 2 hours is a very small range assuming an even distribution - All tables are using 100 bins for statistics - A VACUUM ANALYZE was performed on the entire database before this test As can be seem, the estimator seems to think that 237,893 rows will be returned from the INNER SELECT out of 1,427,411 rows. Can anyone explain how this number could be calculated? Even assuming a straight linear assumption, this should fall into 2hours/(24hours/day*400days) rough estimate which is (very roughly) 0.02% of the table. So, I wrote a function. (Please, no remarks on style, it was a quick hack to compare performance. I am sure it could be made better and more optimized). It simply performs the inner SELECT, and then for each row, does the outer select for the match. CREATE TYPE hack_pull_type AS ( sid int, cid bigint, timestamp timestamp with time zone, sig_name text, ip_src bigint, ip_dst bigint, icmp_type smallint, icmp_code smallint, icmp_csum int, icmp_id int, icmp_seq int, udp_sport int, udp_dport int, udp_len int, udp_csum int, tcp_sport int, tcp_dport int, tcp_seq bigint, tcp_ack bigint, tcp_off smallint, tcp_res smallint, tcp_flags smallint, tcp_win int, tcp_csum int, tcp_urp int, hostname text, interface text, data_payload text ); CREATE OR REPLACE FUNCTION hack_pull_func() RETURNS SETOF hack_pull_type AS ' DECLARE event_rec event%ROWTYPE; -- pull_rec hack_pull_type; pull_rec RECORD; BEGIN FOR event_rec IN SELECT * FROM event WHERE timestamp > (SELECT now() - ''2 hours''::interval) AND EXISTS (SELECT 1 FROM hack_pull_sid WHERE sid = event.sid) ORDER BY timestamp desc, sid desc, cid desc LOOP SELECT event.sid, event.cid, event.timestamp, sig_name, ip_src, ip_dst, icmp_type, icmp_code, icmp_csum, icmp_id, icmp_seq, udp_sport, udp_dport, udp_len, udp_csum, tcp_sport, tcp_dport, tcp_seq, tcp_ack, tcp_off, tcp_res, tcp_flags, tcp_win, tcp_csum, tcp_urp, sensor.hostname, sensor.interface, data.data_payload INTO pull_rec FROM event left join signature on signature.sig_id = event.signature left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid left join sensor on sensor.sid = event.sid left join data on data.sid = event.sid and data.cid = event.cid WHERE event.cid = event_rec.cid and event.sid = event_rec.sid LIMIT 1; RETURN NEXT pull_rec; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql'; => explain analyze select * from hack_pull_func(); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Function Scan on hack_pull_func (cost=0.00..12.50 rows=1000 width=238) (actual time=5259.46..5259.62 rows=23 loops=1) Total runtime: 5259.70 msec (2 rows) => explain analyze select * from hack_pull_func(); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Function Scan on hack_pull_func (cost=0.00..12.50 rows=1000 width=238) (actual time=38.02..38.18 rows=23 loops=1) Total runtime: 38.26 msec (2 rows) (Sorry, the data has changed as time moves forward and I type this email. However, the numbers are reflective of the performance difference I observed during testing). So, what is happening? How come my function (ugly as it may be), is orders of magnitude faster than what should be a clean query in SQL? Why is the estimator so far off the real issue? Is there a way of writing the query to get the good performance? Note that there are a number of other queries I run which exhibit similar performance/estimation issues, and I wonder if I will need to rwrite all of them into custom functions (thanks for the SETOF piece as this makes it possible to write this stuff now). Thanks! Ed
Edmund Dengler <edmundd@eSentire.com> writes: > select * from event > where timestamp > (select now() - '2 hours'::interval) > and exists (select 1 from hack_pull_sid where sid = event.sid) > (note: <hack_pull_sid> is a table of SIDs I am interested in so that I > avoid the issues with IN) I think you're creating more issues than you're avoiding. With the above query, the planner has little chance of guessing how many rows will be retrieved from "event" ... and indeed the EXPLAIN output shows that its guess is off by more than a factor of 1000: > -> Index Scan using timestamp_idx on event (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05 rows=129loops=1) With a misestimate of that magnitude at the core of the query, it's unsurprising that all the other plan choices are bad too. But actually I suspect the easiest point of attack is not the EXISTS subquery, but the timestamp comparison. Can you get your application to supply a simple literal constant to compare to the timestamp, viz '2003-06-10 21:44' rather than now()-'2 hours'? The former gives the planner something to compare to its statistics, the latter doesn't. Oh ... you have done an ANALYZE on event reasonably recently, no? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, June 10, 2003 8:50 PM > To: Edmund Dengler > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Performance of query (fwd) > > > Edmund Dengler <edmundd@eSentire.com> writes: > > select * from event > > where timestamp > (select now() - '2 hours'::interval) > > and exists (select 1 from hack_pull_sid where sid = event.sid) > > > (note: <hack_pull_sid> is a table of SIDs I am interested > in so that I > > avoid the issues with IN) > > I think you're creating more issues than you're avoiding. > With the above query, the planner has little chance of > guessing how many rows will be retrieved from "event" ... and > indeed the EXPLAIN output shows that its guess is off by more > than a factor of 1000: > > > -> Index Scan using timestamp_idx on event > > (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05 > > rows=129 loops=1) > > With a misestimate of that magnitude at the core of the > query, it's unsurprising that all the other plan choices are bad too. > > But actually I suspect the easiest point of attack is not the > EXISTS subquery, but the timestamp comparison. Can you get > your application to supply a simple literal constant to > compare to the timestamp, viz '2003-06-10 21:44' rather than > now()-'2 hours'? The former gives the planner something to > compare to its statistics, the latter doesn't. In a case like that, wouldn't it be worthwhile having the planner actually evaluate the expression? IOW, the planner sees: {column} <rel_op> {expression} And it knows that {column} has a key (any type of key if relop is 'equals' and anything but hashed for all other relational operators) Then go ahead and evaluation expression and directly substitute the answer, unless expression is an aggregate. So {column} <rel_op> {expression} becomes: {column} <rel_op> {CONSTANT} > Oh ... you have done an ANALYZE on event reasonably recently, no?
"Dann Corbit" <DCorbit@connx.com> writes: >> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] >> But actually I suspect the easiest point of attack is not the >> EXISTS subquery, but the timestamp comparison. Can you get >> your application to supply a simple literal constant to >> compare to the timestamp, viz '2003-06-10 21:44' rather than >> now()-'2 hours'? The former gives the planner something to >> compare to its statistics, the latter doesn't. > In a case like that, wouldn't it be worthwhile having the planner > actually evaluate the expression? I've thought about that ... but am not sure whether it wouldn't create as many problems as it solves. What are the consequences when the planner's pre-evaluation yields a different result from what actually happens at runtime? Hardly an unlikely scenario when dealing with stuff like now(). (And for actually constant expressions, say sin(pi), the result already does get folded to a constant. So only the nontrivial cases are left to think about.) In the case of functions with side-effects (think nextval()), the planner *must not* speculatively evaluate the function. We could introduce another function property-flag, like IMMUTABLE or STRICT, to govern this ... but I'm not sure it's worth the trouble. regards, tom lane
Tom Lane wrote: >I've thought about that ... but am not sure whether it wouldn't create >as many problems as it solves. What are the consequences when the >planner's pre-evaluation yields a different result from what actually >happens at runtime? Hardly an unlikely scenario when dealing with >stuff like now(). > But isn't now() supposed to return the same value within the same transaction? Isn't that a guarantee, that it won't change between pre-evaluation and actual execution? > (And for actually constant expressions, say >sin(pi), the result already does get folded to a constant. So only >the nontrivial cases are left to think about.) > > >In the case of functions with side-effects (think nextval()), the >planner *must not* speculatively evaluate the function. We could >introduce another function property-flag, like IMMUTABLE or STRICT, >to govern this ... but I'm not sure it's worth the trouble. > > I must be missing something here - isn't it enough to use the same logic as when deciding whether the function output can be usedfor index scan? In 7.2 conditions like 'where stamp < now()' were refusing to use the index, because now() isn't immutable (cacheable)... I see that it is fixed in 7.3 by adding another flag - STABLE... It looks like the planner is now able to use the index as long as the function is not volatile. Can't the same criteria be applied to that pre-evaluation decision? Dima
Ok, tried it with a constant. No improvements. The estimator is still very high. Regards, Ed explain analyze select event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst, icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq, udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum, tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off, tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp, sensor.hostname, sensor.interface, data.data_payload from ( select * from event where timestamp > '2003-06-11 9:22 EDT'::timestamp and exists (select 1 from hack_pull_sid where sid = event.sid) ) as event left join signature on signature.sig_id = event.signature left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid left join sensor on sensor.sid = event.sid left join data on data.sid = event.sid and data.cid = event.cid order by timestamp desc, event.sid desc, event.cid desc ; --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1710629.85..1711222.69 rows=237136 width=853) (actual time=308403.45..308403.61 rows=139 loops=1) Sort Key: public.event."timestamp", public.event.sid, public.event.cid -> Merge Join (cost=1470394.12..1550828.17 rows=237136 width=853) (actual time=299837.82..308402.64 rows=139 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=727702.20..801483.40 rows=237136 width=258) (actual time=45098.40..50164.32 rows=139loops=1) Merge Cond: ("outer".sid = "inner".sid) -> Merge Join (cost=727700.17..797923.97 rows=237136 width=226) (actual time=45077.16..50140.32 rows=139loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=727700.17..735968.33 rows=237136 width=172) (actual time=25446.85..26861.14 rows=139loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=727700.17..730447.45 rows=237136 width=144) (actual time=23709.07..24201.98rows=139 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Sort (cost=700380.22..700973.06 rows=237136 width=116) (actual time=17821.61..17821.81rows=139 loops=1) Sort Key: public.event.sid, public.event.cid -> Merge Join (cost=590252.70..671029.48 rows=237136 width=116) (actual time=11569.28..17821.25rows=139 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Index Scan using iphdr_pkey on iphdr (cost=0.00..73422.89 rows=1432042width=28) (actual time=18.79..16195.02 rows=730166 loops=1) -> Sort (cost=590252.70..590845.54 rows=237136 width=88) (actual time=52.35..52.55rows=139 loops=1) Sort Key: public.event.sid, public.event.cid -> Hash Join (cost=185.07..562558.03 rows=237136 width=88) (actual time=34.28..51.89rows=139 loops=1) Hash Cond: ("outer".signature = "inner".sig_id) -> Index Scan using timestamp_idx on event (cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) Index Cond: ("timestamp" > ('2003-06-11 09:22:00'::timestampwithout time zone)::timestamp with time zone) Filter: (subplan) SubPlan -> Seq Scan on hack_pull_sid (cost=0.00..1.15 rows=1 width=0)(actual time=0.01..0.01 rows=0 loops=953) Filter: (sid = $0) -> Hash (cost=167.26..167.26 rows=7126 width=64) (actualtime=33.91..33.91rows=0 loops=1) -> Seq Scan on signature (cost=0.00..167.26 rows=7126 width=64)(actual time=0.04..17.74 rows=7131 loops=1) -> Sort (cost=27319.94..27897.87 rows=231171 width=28) (actual time=5671.28..6088.28 rows=130551loops=1) Sort Key: icmphdr.sid, icmphdr.cid -> Seq Scan on icmphdr (cost=0.00..3784.71 rows=231171 width=28) (actual time=3.87..1278.01rows=232877 loops=1) -> Index Scan using udphdr_pkey on udphdr (cost=0.00..4411.44 rows=115228 width=28) (actualtime=79.77..2532.43 rows=56946 loops=1) -> Index Scan using tcphdr_pkey on tcphdr (cost=0.00..56586.06 rows=1139141 width=54) (actual time=89.40..22157.07rows=504556 loops=1) -> Sort (cost=2.04..2.11 rows=30 width=32) (actual time=21.18..21.40 rows=158 loops=1) Sort Key: sensor.sid -> Seq Scan on sensor (cost=0.00..1.30 rows=30 width=32) (actual time=20.98..21.05 rows=30 loops=1) -> Sort (cost=742691.92..745059.80 rows=947154 width=595) (actual time=246938.33..257244.21 rows=444980 loops=1) Sort Key: data.sid, data.cid -> Seq Scan on data (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.68..198574.76 rows=955002loops=1) Total runtime: 308903.04 msec (41 rows) On Tue, 10 Jun 2003, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > select * from event > > where timestamp > (select now() - '2 hours'::interval) > > and exists (select 1 from hack_pull_sid where sid = event.sid) > > > (note: <hack_pull_sid> is a table of SIDs I am interested in so that I > > avoid the issues with IN) > > I think you're creating more issues than you're avoiding. With the > above query, the planner has little chance of guessing how many rows > will be retrieved from "event" ... and indeed the EXPLAIN output shows > that its guess is off by more than a factor of 1000: > > > -> Index Scan using timestamp_idx on event (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05rows=129 loops=1) > > With a misestimate of that magnitude at the core of the query, it's > unsurprising that all the other plan choices are bad too. > > But actually I suspect the easiest point of attack is not the EXISTS > subquery, but the timestamp comparison. Can you get your application > to supply a simple literal constant to compare to the timestamp, viz > '2003-06-10 21:44' rather than now()-'2 hours'? The former gives the > planner something to compare to its statistics, the latter doesn't. > > Oh ... you have done an ANALYZE on event reasonably recently, no? > > regards, tom lane >
Dmitry Tkach <dmitry@openratings.com> writes: > Tom Lane wrote: >> I've thought about that ... but am not sure whether it wouldn't create >> as many problems as it solves. What are the consequences when the >> planner's pre-evaluation yields a different result from what actually >> happens at runtime? Hardly an unlikely scenario when dealing with >> stuff like now(). > But isn't now() supposed to return the same value within the same > transaction? What's that have to do with it? Plans have to be good for longer than one transaction --- see prepared statements and plpgsql plan caching. > I must be missing something here - isn't it enough to use the same logic > as when deciding whether the function output can be usedfor index scan? No, see above. If we could, we'd not have bothered to make a distinction between IMMUTABLE and STABLE functions, we'd just allow the planner to fold them all to constants at plan time. (STABLE functions aren't actually guaranteed to hold still even across one transaction, only across one command in a transaction. But that's not really relevant to the planner's problem.) regards, tom lane
Edmund Dengler <edmundd@eSentire.com> writes: > Ok, tried it with a constant. No improvements. The estimator is still very > high. > where timestamp > '2003-06-11 9:22 EDT'::timestamp > -> Index Scan using timestamp_idx on event (cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) > Index Cond: ("timestamp" > ('2003-06-11 09:22:00'::timestampwithout time zone)::timestamp with time zone) It would probably help if you cast the constant to the same datatype as the column is (which is evidently timestamp with time zone). Coercions between timestamp and timestamptz are not considered constants, because they depend on SET TIMEZONE. regards, tom lane
Ok, I see now... I did not know that those cached plans survived transaction end... Thanks for clarification. Dima Tom Lane wrote: >Dmitry Tkach <dmitry@openratings.com> writes: > > >>Tom Lane wrote: >> >> >>>I've thought about that ... but am not sure whether it wouldn't create >>>as many problems as it solves. What are the consequences when the >>>planner's pre-evaluation yields a different result from what actually >>>happens at runtime? Hardly an unlikely scenario when dealing with >>>stuff like now(). >>> >>> > > > >>But isn't now() supposed to return the same value within the same >>transaction? >> >> > >What's that have to do with it? Plans have to be good for longer than >one transaction --- see prepared statements and plpgsql plan caching. > > > >>I must be missing something here - isn't it enough to use the same logic >>as when deciding whether the function output can be usedfor index scan? >> >> > >No, see above. If we could, we'd not have bothered to make a >distinction between IMMUTABLE and STABLE functions, we'd just allow the >planner to fold them all to constants at plan time. (STABLE functions >aren't actually guaranteed to hold still even across one transaction, >only across one command in a transaction. But that's not really >relevant to the planner's problem.) > > regards, tom lane > >
Whoops! Thanks! Tried it, and it did return very fast (about a second and a half). Things switched from a "merge join" to a "nested loop". Regards, Ed On Wed, 11 Jun 2003, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > Ok, tried it with a constant. No improvements. The estimator is still very > > high. > > > where timestamp > '2003-06-11 9:22 EDT'::timestamp > > > -> Index Scan using timestamp_idx on event (cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) > > Index Cond: ("timestamp" > ('2003-06-11 09:22:00'::timestampwithout time zone)::timestamp with time zone) > > It would probably help if you cast the constant to the same datatype > as the column is (which is evidently timestamp with time zone). > Coercions between timestamp and timestamptz are not considered > constants, because they depend on SET TIMEZONE. > > regards, tom lane >
Greetings all! Interesting. This brings up a quick question. If a "prepare" statement is run, and then values are passed in, does the plan stay the same regardless of the values? Or is a second pass made to re-optimize the plan? Regards! Ed On Wed, 11 Jun 2003, Dmitry Tkach wrote: > Ok, I see now... > I did not know that those cached plans survived transaction end... > > Thanks for clarification. > > Dima > > Tom Lane wrote: > > >Dmitry Tkach <dmitry@openratings.com> writes: > > > > > >>Tom Lane wrote: > >> > >> > >>>I've thought about that ... but am not sure whether it wouldn't create > >>>as many problems as it solves. What are the consequences when the > >>>planner's pre-evaluation yields a different result from what actually > >>>happens at runtime? Hardly an unlikely scenario when dealing with > >>>stuff like now(). > >>> > >>> > > > > > > > >>But isn't now() supposed to return the same value within the same > >>transaction? > >> > >> > > > >What's that have to do with it? Plans have to be good for longer than > >one transaction --- see prepared statements and plpgsql plan caching. > > > > > > > >>I must be missing something here - isn't it enough to use the same logic > >>as when deciding whether the function output can be usedfor index scan? > >> > >> > > > >No, see above. If we could, we'd not have bothered to make a > >distinction between IMMUTABLE and STABLE functions, we'd just allow the > >planner to fold them all to constants at plan time. (STABLE functions > >aren't actually guaranteed to hold still even across one transaction, > >only across one command in a transaction. But that's not really > >relevant to the planner's problem.) > > > > regards, tom lane > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
Edmund Dengler <edmundd@eSentire.com> writes: > Interesting. This brings up a quick question. If a "prepare" statement is > run, and then values are passed in, does the plan stay the same regardless > of the values? Or is a second pass made to re-optimize the plan? AFAICS, the entire *point* of a prepared statement is that the plan is computed just once for multiple uses. regards, tom lane
Yes, but it does make an impact on decisions. For example, take the one I was having a problem with. Understanding that the optimizer will make certain assumptions about the distribution of values, and use this in the estimator, and then realizing that for a specific set of values, no changes will occurs makes a big impact. You have to now carefully plan and _test_ both ways (using a prepared statement, vs always re-preparing each statement in a loop) to determine where your best performance is. In a perfect world, an optimizer would return a set of plans, or alternatively, provide some bounds of values on when it would be better to replan once the variables are known. Another alternative would be to plan, and if a portion that is executed is vastly off the original estimates, to replan the rest of the statements. For example, the problem with my statement was not the internal select (which, while estimated high, really did run fast), but that the actual number of returned rows was much smaller than the planner had estimated. If there could have been some kind of trigger to basically indicate "hey, this is really, really off the estimate, let's take a moment to replan the rest of the steps", then it would have probably run much better, at the cost of a second pass through the planner. Regards, Ed On Wed, 11 Jun 2003, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > Interesting. This brings up a quick question. If a "prepare" statement is > > run, and then values are passed in, does the plan stay the same regardless > > of the values? Or is a second pass made to re-optimize the plan? > > AFAICS, the entire *point* of a prepared statement is that the plan is > computed just once for multiple uses. > > regards, tom lane >