Обсуждение: Performance of query (fwd)

Поиск
Список
Период
Сортировка

Performance of query (fwd)

От
Edmund Dengler
Дата:
Hmmm, it's been a couple of days, and this still hasn't appeared on
the mailing list as far as I can tell. This is a resend.

---------- Forwarded message ----------
Date: Mon, 9 Jun 2003 00:23:38 -0400 (EDT)
From: Edmund Dengler <edmundd@eSentire.com>
To: pgsql-general@postgresql.org
Subject: Performance of query

Ok, now I am really confused. I have a query that runs really slow, yet
when I code it up as a function it runs very fast! The optimizer seems to
not evaluate a good method for performing the query.

Background: This is a database based off of Snort. There is a base table
called <event> (with <sid>=sensor ID, <cid>=event ID, <timestamp>=when
event collected, <signature>=which event was recognized), and a number of
depending tables for various bits and pieces of network packet data
(<iphdr>=IP header info, <tcphdr>=TCP header info, <data>=packet payload)
and <signature> (what event).

Notes: Running 7.3.3
Database is around 20GB on disk.

=> select version();
                               version
---------------------------------------------------------------------
 PostgreSQL 7.3.3 on i386-unknown-openbsd3.0, compiled by GCC 2.95.3
(1 row)

=> \d event
               Table "public.event"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 sid       | integer                  | not null
 cid       | bigint                   | not null
 signature | integer                  | not null
 timestamp | timestamp with time zone | not null
Indexes: event_pkey primary key btree (sid, cid),
         event_pull_idx unique btree ("timestamp", sid, cid),
         signature_idx btree (signature),
         timestamp_idx btree ("timestamp")
Triggers: event_summary

=> \d signature
                             Table "public.signature"
    Column    |  Type   |                        Modifiers
--------------+---------+----------------------------------------------------------
 sig_id       | integer | not null default nextval('"signature_sig_id_seq"'::text)
<<...more columns...>>
Indexes: signature_pkey primary key btree (sig_id),
         sig_class_id btree (sig_class_id),
         sig_name_idx btree (sig_name)

Rest of the tables are primary keyed with (sid, cid). For example,

=> \d iphdr
      Table "public.iphdr"
  Column  |   Type   | Modifiers
----------+----------+-----------
 sid      | integer  | not null
 cid      | bigint   | not null
<<...more columns...>>
Indexes: iphdr_pkey primary key btree (sid, cid),
         ip_dst_idx btree (ip_dst),
         ip_src_idx btree (ip_src)
Triggers: iphdr_summary

If I run the query:

explain analyze
select
  event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst,
  icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq,
  udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum,
  tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off,
  tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp,
  sensor.hostname, sensor.interface, data.data_payload
from (
  select * from event
  where timestamp > (select now() - '2 hours'::interval)
    and exists (select 1 from hack_pull_sid where sid = event.sid)
) as event
  left join signature on signature.sig_id = event.signature
  left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid
  left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid
  left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid
  left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid
  left join sensor on sensor.sid = event.sid
  left join data on data.sid = event.sid and data.cid = event.cid
order by
  timestamp desc, event.sid desc, event.cid desc
;

(note: <hack_pull_sid> is a table of SIDs I am interested in so that I
avoid the issues with IN)

I get the following output:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1712181.78..1712776.52 rows=237893 width=853) (actual time=163484.81..163484.97 rows=129 loops=1)
   Sort Key: public.event."timestamp", public.event.sid, public.event.cid
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.04..0.04 rows=1 loops=1)
   ->  Merge Join  (cost=1471148.73..1551631.57 rows=237893 width=853) (actual time=152693.86..163484.26 rows=129
loops=1)
         Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
         ->  Merge Join  (cost=728456.82..802284.20 rows=237893 width=258) (actual time=33652.33..39050.11 rows=129
loops=1)
               Merge Cond: ("outer".sid = "inner".sid)
               ->  Merge Join  (cost=728454.78..798713.57 rows=237893 width=226) (actual time=33635.79..39031.06
rows=129loops=1) 
                     Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                     ->  Merge Join  (cost=728454.78..736750.59 rows=237893 width=172) (actual
time=22767.67..24071.84rows=129loops=1) 
                           Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                           ->  Merge Join  (cost=728454.78..731219.48 rows=237893 width=144) (actual
time=22164.44..22681.23rows=129 loops=1) 
                                 Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                 ->  Sort  (cost=701134.84..701729.57 rows=237893 width=116) (actual
time=15343.87..15344.05rows=129 loops=1) 
                                       Sort Key: public.event.sid, public.event.cid
                                       ->  Merge Join  (cost=590896.58..671684.37 rows=237893 width=116) (actual
time=10857.65..15343.57rows=129 loops=1) 
                                             Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                             ->  Index Scan using iphdr_pkey on iphdr  (cost=0.00..73422.89
rows=1432042width=28) (actual time=23.19..13810.58 rows=687918 loops=1) 
                                             ->  Sort  (cost=590896.58..591491.31 rows=237893 width=88) (actual
time=101.79..101.95rows=129 loops=1) 
                                                   Sort Key: public.event.sid, public.event.cid
                                                   ->  Hash Join  (cost=185.07..563108.55 rows=237893 width=88) (actual
time=97.42..101.39rows=129 loops=1) 
                                                         Hash Cond: ("outer".signature = "inner".sig_id)
                                                         ->  Index Scan using timestamp_idx on event
(cost=0.00..558165.62rows=237893 width=24) (actual time=0.18..3.05 rows=129 loops=1) 
                                                               Index Cond: ("timestamp" > $0)
                                                               Filter: (subplan)
                                                               SubPlan
                                                                 ->  Seq Scan on hack_pull_sid  (cost=0.00..1.15 rows=1
width=0)(actual time=0.01..0.01 rows=1 loops=171) 
                                                                       Filter: (sid = $1)
                                                         ->  Hash  (cost=167.26..167.26 rows=7126 width=64) (actual
time=97.12..97.12rows=0 loops=1) 
                                                               ->  Seq Scan on signature  (cost=0.00..167.26
rows=7126width=64)(actual time=0.01..80.86 rows=7126 loops=1) 
                                 ->  Sort  (cost=27319.94..27897.87 rows=231171 width=28) (actual time=6606.52..7073.38
rows=110717loops=1) 
                                       Sort Key: icmphdr.sid, icmphdr.cid
                                       ->  Seq Scan on icmphdr  (cost=0.00..3784.71 rows=231171 width=28) (actual
time=19.81..2082.01rows=231303 loops=1) 
                           ->  Index Scan using udphdr_pkey on udphdr  (cost=0.00..4411.44 rows=115228 width=28)
(actualtime=60.23..1269.61 rows=54669 loops=1) 
                     ->  Index Scan using tcphdr_pkey on tcphdr  (cost=0.00..56586.06 rows=1139141 width=54) (actual
time=88.72..13878.52rows=484419 loops=1) 
               ->  Sort  (cost=2.04..2.11 rows=30 width=32) (actual time=16.46..16.64 rows=136 loops=1)
                     Sort Key: sensor.sid
                     ->  Seq Scan on sensor  (cost=0.00..1.30 rows=30 width=32) (actual time=16.26..16.34 rows=30
loops=1)
         ->  Sort  (cost=742691.92..745059.80 rows=947154 width=595) (actual time=108643.60..123322.67 rows=417145
loops=1)
               Sort Key: data.sid, data.cid
               ->  Seq Scan on data  (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.67..54494.61
rows=947324loops=1) 
 Total runtime: 164147.78 msec
(43 rows)

For some runs, I was getting over 800,000 msec!!

Now, this seems very strange for only 129 rows. The inner select returns
exactly the 129 rows I am interested in, and now we only need to do a few
left joins to get the corresponding rows (which may or may not exist,
which is why the LEFT JOIN's). Note that the (<sid>,<cid>) is the primary
key for most of these joins, and so the LEFT JOIN should be able to
recognize that there will only be one or zero rows matching. In addition,
(<sid>,<cid>)  is the primary key for the <event> table. Similar
discussion with <signature> as <sig_id> is it's primary key.

Notes:
- The table <event> has 1,427,411 rows
- <event>.<timestamp> ranges from 2002-01-19 12:28:29-05 to
  2003-06-08 23:40:01-04
- The last 2 hours is a very small range assuming an even distribution
- All tables are using 100 bins for statistics
- A VACUUM ANALYZE was performed on the entire database before this test

As can be seem, the estimator seems to think that 237,893 rows will be
returned from the INNER SELECT out of 1,427,411 rows. Can anyone explain
how this number could be calculated? Even assuming a straight linear
assumption, this should fall into 2hours/(24hours/day*400days) rough
estimate which is (very roughly) 0.02% of the table.

So, I wrote a function. (Please, no remarks on style, it was a quick hack
to compare performance. I am sure it could be made better and more
optimized). It simply performs the inner SELECT, and then for each row,
does the outer select for the match.

CREATE TYPE hack_pull_type AS (
  sid int, cid bigint, timestamp timestamp with time zone,
  sig_name text,
  ip_src bigint, ip_dst bigint,
  icmp_type smallint, icmp_code smallint, icmp_csum int, icmp_id int, icmp_seq int,
  udp_sport int, udp_dport int, udp_len int, udp_csum int,
  tcp_sport int, tcp_dport int, tcp_seq bigint, tcp_ack bigint, tcp_off smallint,
  tcp_res smallint, tcp_flags smallint, tcp_win int, tcp_csum int, tcp_urp int,
  hostname text, interface text, data_payload text
);

CREATE OR REPLACE FUNCTION hack_pull_func() RETURNS SETOF hack_pull_type
AS '
DECLARE
  event_rec event%ROWTYPE;
  -- pull_rec hack_pull_type;
  pull_rec RECORD;
BEGIN
  FOR event_rec IN
    SELECT *
    FROM event
    WHERE timestamp > (SELECT now() - ''2 hours''::interval)
      AND EXISTS (SELECT 1 FROM hack_pull_sid WHERE sid = event.sid)
    ORDER BY timestamp desc, sid desc, cid desc
  LOOP
    SELECT
      event.sid, event.cid, event.timestamp,
      sig_name, ip_src, ip_dst,
      icmp_type, icmp_code, icmp_csum, icmp_id, icmp_seq,
      udp_sport, udp_dport, udp_len, udp_csum,
      tcp_sport, tcp_dport, tcp_seq, tcp_ack, tcp_off, tcp_res,
      tcp_flags, tcp_win, tcp_csum, tcp_urp,
      sensor.hostname, sensor.interface, data.data_payload
    INTO pull_rec
    FROM event
      left join signature on signature.sig_id = event.signature
      left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid
      left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid
      left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid
      left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid
      left join sensor on sensor.sid = event.sid
      left join data on data.sid = event.sid and data.cid = event.cid
    WHERE
      event.cid = event_rec.cid and event.sid = event_rec.sid
    LIMIT 1;
    RETURN NEXT pull_rec;
  END LOOP;

  RETURN;
END;
' LANGUAGE 'plpgsql';

=> explain analyze select * from hack_pull_func();
                                                     QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Function Scan on hack_pull_func  (cost=0.00..12.50 rows=1000 width=238) (actual time=5259.46..5259.62 rows=23 loops=1)
 Total runtime: 5259.70 msec
(2 rows)

=> explain analyze select * from hack_pull_func();
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Function Scan on hack_pull_func  (cost=0.00..12.50 rows=1000 width=238) (actual time=38.02..38.18 rows=23 loops=1)
 Total runtime: 38.26 msec
(2 rows)

(Sorry, the data has changed as time moves forward and I type this email.
However, the numbers are reflective of the performance difference I
observed during testing).

So, what is happening? How come my function (ugly as it may be), is orders
of magnitude faster than what should be a clean query in SQL? Why is the
estimator so far off the real issue?

Is there a way of writing the query to get the good performance? Note that
there are a number of other queries I run which exhibit similar
performance/estimation issues, and I wonder if I will need to rwrite all
of them into custom functions (thanks for the SETOF piece as this makes it
possible to write this stuff now).

Thanks!
Ed




Re: Performance of query (fwd)

От
Tom Lane
Дата:
Edmund Dengler <edmundd@eSentire.com> writes:
>   select * from event
>   where timestamp > (select now() - '2 hours'::interval)
>     and exists (select 1 from hack_pull_sid where sid = event.sid)

> (note: <hack_pull_sid> is a table of SIDs I am interested in so that I
> avoid the issues with IN)

I think you're creating more issues than you're avoiding.  With the
above query, the planner has little chance of guessing how many rows
will be retrieved from "event" ... and indeed the EXPLAIN output shows
that its guess is off by more than a factor of 1000:

>      ->  Index Scan using timestamp_idx on event  (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05
rows=129loops=1) 

With a misestimate of that magnitude at the core of the query, it's
unsurprising that all the other plan choices are bad too.

But actually I suspect the easiest point of attack is not the EXISTS
subquery, but the timestamp comparison.  Can you get your application
to supply a simple literal constant to compare to the timestamp, viz
'2003-06-10 21:44' rather than now()-'2 hours'?  The former gives the
planner something to compare to its statistics, the latter doesn't.

Oh ... you have done an ANALYZE on event reasonably recently, no?

            regards, tom lane

Re: Performance of query (fwd)

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, June 10, 2003 8:50 PM
> To: Edmund Dengler
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Performance of query (fwd)
>
>
> Edmund Dengler <edmundd@eSentire.com> writes:
> >   select * from event
> >   where timestamp > (select now() - '2 hours'::interval)
> >     and exists (select 1 from hack_pull_sid where sid = event.sid)
>
> > (note: <hack_pull_sid> is a table of SIDs I am interested
> in so that I
> > avoid the issues with IN)
>
> I think you're creating more issues than you're avoiding.
> With the above query, the planner has little chance of
> guessing how many rows will be retrieved from "event" ... and
> indeed the EXPLAIN output shows that its guess is off by more
> than a factor of 1000:
>
> >      ->  Index Scan using timestamp_idx on event
> > (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05
> > rows=129 loops=1)
>
> With a misestimate of that magnitude at the core of the
> query, it's unsurprising that all the other plan choices are bad too.
>
> But actually I suspect the easiest point of attack is not the
> EXISTS subquery, but the timestamp comparison.  Can you get
> your application to supply a simple literal constant to
> compare to the timestamp, viz '2003-06-10 21:44' rather than
> now()-'2 hours'?  The former gives the planner something to
> compare to its statistics, the latter doesn't.

In a case like that, wouldn't it be worthwhile having the planner
actually evaluate the expression?
IOW, the planner sees:
   {column} <rel_op> {expression}
And it knows that {column} has a key (any type of key if relop is
'equals' and anything but hashed for all other relational operators)

Then go ahead and evaluation expression and directly substitute the
answer, unless expression is an aggregate.

So  {column} <rel_op> {expression} becomes:  {column} <rel_op>
{CONSTANT}

> Oh ... you have done an ANALYZE on event reasonably recently, no?

Re: Performance of query (fwd)

От
Tom Lane
Дата:
"Dann Corbit" <DCorbit@connx.com> writes:
>> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>> But actually I suspect the easiest point of attack is not the
>> EXISTS subquery, but the timestamp comparison.  Can you get
>> your application to supply a simple literal constant to
>> compare to the timestamp, viz '2003-06-10 21:44' rather than
>> now()-'2 hours'?  The former gives the planner something to
>> compare to its statistics, the latter doesn't.

> In a case like that, wouldn't it be worthwhile having the planner
> actually evaluate the expression?

I've thought about that ... but am not sure whether it wouldn't create
as many problems as it solves.  What are the consequences when the
planner's pre-evaluation yields a different result from what actually
happens at runtime?  Hardly an unlikely scenario when dealing with
stuff like now().  (And for actually constant expressions, say
sin(pi), the result already does get folded to a constant.  So only
the nontrivial cases are left to think about.)

In the case of functions with side-effects (think nextval()), the
planner *must not* speculatively evaluate the function.  We could
introduce another function property-flag, like IMMUTABLE or STRICT,
to govern this ... but I'm not sure it's worth the trouble.

            regards, tom lane

Re: Performance of query (fwd)

От
Dmitry Tkach
Дата:
Tom Lane wrote:

>I've thought about that ... but am not sure whether it wouldn't create
>as many problems as it solves.  What are the consequences when the
>planner's pre-evaluation yields a different result from what actually
>happens at runtime?  Hardly an unlikely scenario when dealing with
>stuff like now().
>
But isn't now() supposed to return the same value within the same
transaction?
Isn't that a guarantee, that it won't change between pre-evaluation and
actual execution?

> (And for actually constant expressions, say
>sin(pi), the result already does get folded to a constant.  So only
>the nontrivial cases are left to think about.)
>
>
>In the case of functions with side-effects (think nextval()), the
>planner *must not* speculatively evaluate the function.  We could
>introduce another function property-flag, like IMMUTABLE or STRICT,
>to govern this ... but I'm not sure it's worth the trouble.
>
>

I must be missing something here - isn't it enough to use the same logic
as when deciding whether the function output can be usedfor index scan?
In 7.2 conditions like 'where stamp < now()' were refusing to use the
index, because now() isn't immutable (cacheable)... I see that it is
fixed in 7.3 by adding another
flag - STABLE... It looks like the planner is now able to use the index
as long as the function is not volatile.
Can't the same criteria be applied to that pre-evaluation decision?

Dima


Re: Performance of query (fwd)

От
Edmund Dengler
Дата:
Ok, tried it with a constant. No improvements. The estimator is still very
high.

Regards,
Ed

explain analyze
select
  event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst,
  icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq,
  udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum,
  tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off,
  tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp,
  sensor.hostname, sensor.interface, data.data_payload
from (
  select * from event
  where timestamp > '2003-06-11 9:22 EDT'::timestamp
    and exists (select 1 from hack_pull_sid where sid = event.sid)
) as event
  left join signature on signature.sig_id = event.signature
  left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid
  left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid
  left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid
  left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid
  left join sensor on sensor.sid = event.sid
  left join data on data.sid = event.sid and data.cid = event.cid
order by
  timestamp desc, event.sid desc, event.cid desc
;



---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1710629.85..1711222.69 rows=237136 width=853) (actual time=308403.45..308403.61 rows=139 loops=1)
   Sort Key: public.event."timestamp", public.event.sid, public.event.cid
   ->  Merge Join  (cost=1470394.12..1550828.17 rows=237136 width=853) (actual time=299837.82..308402.64 rows=139
loops=1)
         Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
         ->  Merge Join  (cost=727702.20..801483.40 rows=237136 width=258) (actual time=45098.40..50164.32
rows=139loops=1)
               Merge Cond: ("outer".sid = "inner".sid)
               ->  Merge Join  (cost=727700.17..797923.97 rows=237136 width=226) (actual time=45077.16..50140.32
rows=139loops=1) 
                     Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                     ->  Merge Join  (cost=727700.17..735968.33 rows=237136 width=172) (actual time=25446.85..26861.14
rows=139loops=1) 
                           Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                           ->  Merge Join  (cost=727700.17..730447.45 rows=237136 width=144) (actual
time=23709.07..24201.98rows=139 loops=1) 
                                 Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                 ->  Sort  (cost=700380.22..700973.06 rows=237136 width=116) (actual
time=17821.61..17821.81rows=139 loops=1) 
                                       Sort Key: public.event.sid, public.event.cid
                                       ->  Merge Join  (cost=590252.70..671029.48 rows=237136 width=116) (actual
time=11569.28..17821.25rows=139 loops=1) 
                                             Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                             ->  Index Scan using iphdr_pkey on iphdr  (cost=0.00..73422.89
rows=1432042width=28) (actual time=18.79..16195.02 rows=730166 loops=1) 
                                             ->  Sort  (cost=590252.70..590845.54 rows=237136 width=88) (actual
time=52.35..52.55rows=139 loops=1) 
                                                   Sort Key: public.event.sid, public.event.cid
                                                   ->  Hash Join  (cost=185.07..562558.03 rows=237136 width=88) (actual
time=34.28..51.89rows=139 loops=1) 
                                                         Hash Cond: ("outer".signature = "inner".sig_id)
                                                         ->  Index Scan using timestamp_idx on event
(cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) 
                                                               Index Cond: ("timestamp" > ('2003-06-11
09:22:00'::timestampwithout time zone)::timestamp with time zone) 
                                                               Filter: (subplan)
                                                               SubPlan
                                                                 ->  Seq Scan on hack_pull_sid  (cost=0.00..1.15 rows=1
width=0)(actual time=0.01..0.01 rows=0 loops=953) 
                                                                       Filter: (sid = $0)
                                                         ->  Hash  (cost=167.26..167.26 rows=7126 width=64)
(actualtime=33.91..33.91rows=0 loops=1) 
                                                               ->  Seq Scan on signature  (cost=0.00..167.26 rows=7126
width=64)(actual time=0.04..17.74 rows=7131 loops=1) 
                                 ->  Sort  (cost=27319.94..27897.87 rows=231171 width=28) (actual time=5671.28..6088.28
rows=130551loops=1) 
                                       Sort Key: icmphdr.sid, icmphdr.cid
                                       ->  Seq Scan on icmphdr  (cost=0.00..3784.71 rows=231171 width=28) (actual
time=3.87..1278.01rows=232877 loops=1) 
                           ->  Index Scan using udphdr_pkey on udphdr  (cost=0.00..4411.44 rows=115228 width=28)
(actualtime=79.77..2532.43 rows=56946 loops=1) 
                     ->  Index Scan using tcphdr_pkey on tcphdr  (cost=0.00..56586.06 rows=1139141 width=54) (actual
time=89.40..22157.07rows=504556 loops=1) 
               ->  Sort  (cost=2.04..2.11 rows=30 width=32) (actual time=21.18..21.40 rows=158 loops=1)
                     Sort Key: sensor.sid
                     ->  Seq Scan on sensor  (cost=0.00..1.30 rows=30 width=32) (actual time=20.98..21.05 rows=30
loops=1)
         ->  Sort  (cost=742691.92..745059.80 rows=947154 width=595) (actual time=246938.33..257244.21 rows=444980
loops=1)
               Sort Key: data.sid, data.cid
               ->  Seq Scan on data  (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.68..198574.76
rows=955002loops=1) 
 Total runtime: 308903.04 msec
(41 rows)



On Tue, 10 Jun 2003, Tom Lane wrote:

> Edmund Dengler <edmundd@eSentire.com> writes:
> >   select * from event
> >   where timestamp > (select now() - '2 hours'::interval)
> >     and exists (select 1 from hack_pull_sid where sid = event.sid)
>
> > (note: <hack_pull_sid> is a table of SIDs I am interested in so that I
> > avoid the issues with IN)
>
> I think you're creating more issues than you're avoiding.  With the
> above query, the planner has little chance of guessing how many rows
> will be retrieved from "event" ... and indeed the EXPLAIN output shows
> that its guess is off by more than a factor of 1000:
>
> >      ->  Index Scan using timestamp_idx on event  (cost=0.00..558165.62 rows=237893 width=24) (actual
time=0.18..3.05rows=129 loops=1) 
>
> With a misestimate of that magnitude at the core of the query, it's
> unsurprising that all the other plan choices are bad too.
>
> But actually I suspect the easiest point of attack is not the EXISTS
> subquery, but the timestamp comparison.  Can you get your application
> to supply a simple literal constant to compare to the timestamp, viz
> '2003-06-10 21:44' rather than now()-'2 hours'?  The former gives the
> planner something to compare to its statistics, the latter doesn't.
>
> Oh ... you have done an ANALYZE on event reasonably recently, no?
>
>             regards, tom lane
>


Re: Performance of query (fwd)

От
Tom Lane
Дата:
Dmitry Tkach <dmitry@openratings.com> writes:
> Tom Lane wrote:
>> I've thought about that ... but am not sure whether it wouldn't create
>> as many problems as it solves.  What are the consequences when the
>> planner's pre-evaluation yields a different result from what actually
>> happens at runtime?  Hardly an unlikely scenario when dealing with
>> stuff like now().

> But isn't now() supposed to return the same value within the same
> transaction?

What's that have to do with it?  Plans have to be good for longer than
one transaction --- see prepared statements and plpgsql plan caching.

> I must be missing something here - isn't it enough to use the same logic
> as when deciding whether the function output can be usedfor index scan?

No, see above.  If we could, we'd not have bothered to make a
distinction between IMMUTABLE and STABLE functions, we'd just allow the
planner to fold them all to constants at plan time.  (STABLE functions
aren't actually guaranteed to hold still even across one transaction,
only across one command in a transaction.  But that's not really
relevant to the planner's problem.)

            regards, tom lane

Re: Performance of query (fwd)

От
Tom Lane
Дата:
Edmund Dengler <edmundd@eSentire.com> writes:
> Ok, tried it with a constant. No improvements. The estimator is still very
> high.

>   where timestamp > '2003-06-11 9:22 EDT'::timestamp

>                                                          ->  Index Scan using timestamp_idx on event
(cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) 
>                                                                Index Cond: ("timestamp" > ('2003-06-11
09:22:00'::timestampwithout time zone)::timestamp with time zone) 

It would probably help if you cast the constant to the same datatype
as the column is (which is evidently timestamp with time zone).
Coercions between timestamp and timestamptz are not considered
constants, because they depend on SET TIMEZONE.

            regards, tom lane

Re: Performance of query (fwd)

От
Dmitry Tkach
Дата:
Ok, I see now...
I did not know that those cached plans  survived transaction end...

Thanks for clarification.

Dima

Tom Lane wrote:

>Dmitry Tkach <dmitry@openratings.com> writes:
>
>
>>Tom Lane wrote:
>>
>>
>>>I've thought about that ... but am not sure whether it wouldn't create
>>>as many problems as it solves.  What are the consequences when the
>>>planner's pre-evaluation yields a different result from what actually
>>>happens at runtime?  Hardly an unlikely scenario when dealing with
>>>stuff like now().
>>>
>>>
>
>
>
>>But isn't now() supposed to return the same value within the same
>>transaction?
>>
>>
>
>What's that have to do with it?  Plans have to be good for longer than
>one transaction --- see prepared statements and plpgsql plan caching.
>
>
>
>>I must be missing something here - isn't it enough to use the same logic
>>as when deciding whether the function output can be usedfor index scan?
>>
>>
>
>No, see above.  If we could, we'd not have bothered to make a
>distinction between IMMUTABLE and STABLE functions, we'd just allow the
>planner to fold them all to constants at plan time.  (STABLE functions
>aren't actually guaranteed to hold still even across one transaction,
>only across one command in a transaction.  But that's not really
>relevant to the planner's problem.)
>
>            regards, tom lane
>
>



Re: Performance of query (fwd)

От
Edmund Dengler
Дата:
Whoops! Thanks! Tried it, and it did return very fast (about a second and
a half). Things switched from a "merge join" to a "nested loop".

Regards,
Ed

On Wed, 11 Jun 2003, Tom Lane wrote:

> Edmund Dengler <edmundd@eSentire.com> writes:
> > Ok, tried it with a constant. No improvements. The estimator is still very
> > high.
>
> >   where timestamp > '2003-06-11 9:22 EDT'::timestamp
>
> >                                                          ->  Index Scan using timestamp_idx on event
(cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) 
> >                                                                Index Cond: ("timestamp" > ('2003-06-11
09:22:00'::timestampwithout time zone)::timestamp with time zone) 
>
> It would probably help if you cast the constant to the same datatype
> as the column is (which is evidently timestamp with time zone).
> Coercions between timestamp and timestamptz are not considered
> constants, because they depend on SET TIMEZONE.
>
>             regards, tom lane
>


Re: Performance of query (fwd)

От
Edmund Dengler
Дата:
Greetings all!

Interesting. This brings up a quick question. If a "prepare" statement is
run, and then values are passed in, does the plan stay the same regardless
of the values? Or is a second pass made to re-optimize the plan?

Regards!
Ed


On Wed, 11 Jun 2003, Dmitry Tkach wrote:

> Ok, I see now...
> I did not know that those cached plans  survived transaction end...
>
> Thanks for clarification.
>
> Dima
>
> Tom Lane wrote:
>
> >Dmitry Tkach <dmitry@openratings.com> writes:
> >
> >
> >>Tom Lane wrote:
> >>
> >>
> >>>I've thought about that ... but am not sure whether it wouldn't create
> >>>as many problems as it solves.  What are the consequences when the
> >>>planner's pre-evaluation yields a different result from what actually
> >>>happens at runtime?  Hardly an unlikely scenario when dealing with
> >>>stuff like now().
> >>>
> >>>
> >
> >
> >
> >>But isn't now() supposed to return the same value within the same
> >>transaction?
> >>
> >>
> >
> >What's that have to do with it?  Plans have to be good for longer than
> >one transaction --- see prepared statements and plpgsql plan caching.
> >
> >
> >
> >>I must be missing something here - isn't it enough to use the same logic
> >>as when deciding whether the function output can be usedfor index scan?
> >>
> >>
> >
> >No, see above.  If we could, we'd not have bothered to make a
> >distinction between IMMUTABLE and STABLE functions, we'd just allow the
> >planner to fold them all to constants at plan time.  (STABLE functions
> >aren't actually guaranteed to hold still even across one transaction,
> >only across one command in a transaction.  But that's not really
> >relevant to the planner's problem.)
> >
> >            regards, tom lane
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Performance of query (fwd)

От
Tom Lane
Дата:
Edmund Dengler <edmundd@eSentire.com> writes:
> Interesting. This brings up a quick question. If a "prepare" statement is
> run, and then values are passed in, does the plan stay the same regardless
> of the values? Or is a second pass made to re-optimize the plan?

AFAICS, the entire *point* of a prepared statement is that the plan is
computed just once for multiple uses.

            regards, tom lane

Re: Performance of query (fwd)

От
Edmund Dengler
Дата:
Yes, but it does make an impact on decisions. For example, take the one I
was having a problem with. Understanding that the optimizer will make
certain assumptions about the distribution of values, and use this in the
estimator, and then realizing that for a specific set of values, no
changes will occurs makes a big impact. You have to now carefully plan and
_test_ both ways (using a prepared statement, vs always re-preparing each
statement in a loop) to determine where your best performance is.

In a perfect world, an optimizer would return a set of plans, or
alternatively, provide some bounds of values on when it would be better
to replan once the variables are known.

Another alternative would be to plan, and if a portion that is executed is
vastly off the original estimates, to replan the rest of the statements.
For example, the problem with my statement was not the internal select
(which, while estimated high, really did run fast), but that the actual
number of returned rows was much smaller than the planner had estimated.
If there could have been some kind of trigger to basically indicate "hey,
this is really, really off the estimate, let's take a moment to replan the
rest of the steps", then it would have probably run much better, at the
cost of a second pass through the planner.

Regards,
Ed

On Wed, 11 Jun 2003, Tom Lane wrote:

> Edmund Dengler <edmundd@eSentire.com> writes:
> > Interesting. This brings up a quick question. If a "prepare" statement is
> > run, and then values are passed in, does the plan stay the same regardless
> > of the values? Or is a second pass made to re-optimize the plan?
>
> AFAICS, the entire *point* of a prepared statement is that the plan is
> computed just once for multiple uses.
>
>             regards, tom lane
>