Обсуждение: PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster
I previously posted 'forcing index scan on query produces 16x faster' and it seemed that the consensus was that 8.0.x series had an issue. I have upgraded to the highest practical version for our distro. But we seem to have the same issue. If I force the 'enable_seqscan' off our actual time is 9ms where if 'enable_seqscan' is on the performance is 2200ms ( the good news is the Seq Scan query on 8.2 is 1/2 the time of the 8.0 query ). The paste is below - I reloaded the table from scratch after the 8.2 upgrade. Then I ran a 'REINDEX DATABASE' and a 'VACUUM ANALYZE' (then ran some queries and reran the vac analyze). postream=> SELECT version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 8.2.11 on i386-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33) (1 row) postream=> SET enable_seqscan = false; SET postream=> EXPLAIN ANALYZE postream-> SELECT si.group1_id as name, sum(si.qty) as count, sum(si.amt) as amt postream-> FROM salesitems si, sales s, sysstrings postream-> WHERE si.id = s.id postream-> AND si.group1_id != '' postream-> AND si.group1_id IS NOT NULL postream-> AND NOT si.void postream-> AND NOT s.void postream-> AND NOT s.suspended postream-> AND s.tranzdate >= (cast('2010-02-15' as date) + cast(sysstrings.data as time)) postream-> AND s.tranzdate < ((cast('2010-02-15' as date) + 1) + cast(sysstrings.data as time)) postream-> AND sysstrings.id='net/Console/Employee/Day End Time' postream-> GROUP BY name; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=426973.65..426973.86 rows=14 width=35) (actual time=9.424..9.438 rows=12 loops=1) -> Nested Loop (cost=0.01..426245.31 rows=97113 width=35) (actual time=0.653..6.954 rows=894 loops=1) -> Nested Loop (cost=0.01..2416.59 rows=22477 width=4) (actual time=0.595..2.150 rows=225 loops=1) -> Index Scan using sysstrings_pkey on sysstrings (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1) Index Cond: (id = 'net/Console/Employee/Day End Time'::text) -> Index Scan using sales_tranzdate_index on sales s (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 rows=225 loops=1) Index Cond: ((s.tranzdate >= ('2010-02-15'::date + (sysstrings.data)::time without time zone)) AND (s.tranzdate < ('2010-02-16'::date + (sysstrings.data)::time without time zone))) Filter: ((NOT void) AND (NOT suspended)) -> Index Scan using salesitems_pkey on salesitems si (cost=0.00..18.54 rows=25 width=39) (actual time=0.007..0.013 rows=4 loops=225) Index Cond: (si.id = s.id) Filter: (((group1_id)::text <> ''::text) AND (group1_id IS NOT NULL) AND (NOT void)) Total runtime: 9.585 ms (12 rows) postream=> SET enable_seqscan = true; SET postream=> EXPLAIN ANALYZE postream-> SELECT si.group1_id as name, sum(si.qty) as count, sum(si.amt) as amt postream-> FROM salesitems si, sales s, sysstrings postream-> WHERE si.id = s.id postream-> AND si.group1_id != '' postream-> AND si.group1_id IS NOT NULL postream-> AND NOT si.void postream-> AND NOT s.void postream-> AND NOT s.suspended postream-> AND s.tranzdate >= (cast('2010-02-15' as date) + cast(sysstrings.data as time)) postream-> AND s.tranzdate < ((cast('2010-02-15' as date) + 1) + cast(sysstrings.data as time)) postream-> AND sysstrings.id='net/Console/Employee/Day End Time' postream-> GROUP BY name; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=38315.09..38315.30 rows=14 width=35) (actual time=2206.531..2206.545 rows=12 loops=1) -> Hash Join (cost=2697.55..37586.74 rows=97113 width=35) (actual time=2128.070..2204.048 rows=894 loops=1) Hash Cond: (si.id = s.id) -> Seq Scan on salesitems si (cost=0.00..30578.15 rows=890646 width=39) (actual time=0.047..1487.688 rows=901281 loops=1) Filter: (((group1_id)::text <> ''::text) AND (group1_id IS NOT NULL) AND (NOT void)) -> Hash (cost=2416.59..2416.59 rows=22477 width=4) (actual time=1.823..1.823 rows=225 loops=1) -> Nested Loop (cost=0.01..2416.59 rows=22477 width=4) (actual time=0.477..1.592 rows=225 loops=1) -> Index Scan using sysstrings_pkey on sysstrings (cost=0.00..8.27 rows=1 width=182) (actual time=0.039..0.040 rows=1 loops=1) Index Cond: (id = 'net/Console/Employee/Day End Time'::text) -> Index Scan using sales_tranzdate_index on sales s (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.410..1.187 rows=225 loops=1) Index Cond: ((s.tranzdate >= ('2010-02-15'::date + (sysstrings.data)::time without time zone)) AND (s.tranzdate < ('2010-02-16'::date + (sysstrings.data)::time without time zone))) Filter: ((NOT void) AND (NOT suspended)) Total runtime: 2206.706 ms (13 rows) postream=> \d salesitems; Table "public.salesitems" Column | Type | Modifiers --------------+--------------------------+------------------------ id | integer | not null lineno | smallint | not null plu | character varying(35) | qty | numeric(8,3) | not null amt | numeric(10,2) | last_updated | timestamp with time zone | default now() group1_id | character varying(64) | group2_id | text | group3_id | text | void | boolean | not null default false hash | boolean | not null default false component | boolean | not null default false subitem | boolean | not null default false Indexes: "salesitems_pkey" PRIMARY KEY, btree (id, lineno) "idx_si_group_id" btree (group1_id) "salesitems_last_updated_index" btree (last_updated) -- Christian Brink
Christian Brink <cbrink@r-stream.com> writes: > -> Nested Loop (cost=0.01..2416.59 rows=22477 width=4) > (actual time=0.595..2.150 rows=225 loops=1) > -> Index Scan using sysstrings_pkey on sysstrings > (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1) > Index Cond: (id = 'net/Console/Employee/Day End > Time'::text) > -> Index Scan using sales_tranzdate_index on sales s > (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 > rows=225 loops=1) > Index Cond: ((s.tranzdate >= ('2010-02-15'::date + > (sysstrings.data)::time without time zone)) AND (s.tranzdate < > ('2010-02-16'::date + (sysstrings.data)::time without time zone))) > Filter: ((NOT void) AND (NOT suspended)) The fundamental reason why you're getting a bad plan choice is the factor-of-100 estimation error here. I'm not sure you can do a whole lot about that without rethinking the query --- in particular I would suggest trying to get rid of the non-constant range bounds. You're apparently already plugging in an external variable for the date, so maybe you could handle the time of day similarly instead of joining to sysstrings for it. regards, tom lane
Re: PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster
От
"Eger, Patrick"
Дата:
Not to beat a dead horse excessively, but I think the below is a pretty good argument for index hints? I know the general optimizer wants to be highest priority (I very much agree with this), but I think there are fully legitimate cases like the below. Asking the user to rewrite the query in an unnatural way (or to change optimizer params that may work for 99% of queries) is, IMO not a good thing. Given that the postgres optimizer can never be perfect (as it will never have the perfect knowledge necessary for a perfect decision), I would request that index hints be reconsidered (for 9.0?). I know many users (myself included) are doing this in a very rudimentary way by disabling particular access types on a per session basis "set enable_seqscan=off; set enable_hashjoin=off; QUERY set enable_seqscan=on; set enable_hashjoin=on;"... I'd hack up a patch if I had the time at least =) Best regards, Patrick -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane Sent: Monday, March 22, 2010 12:22 PM To: Christian Brink Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster Christian Brink <cbrink@r-stream.com> writes: > -> Nested Loop (cost=0.01..2416.59 rows=22477 width=4) > (actual time=0.595..2.150 rows=225 loops=1) > -> Index Scan using sysstrings_pkey on sysstrings > (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1) > Index Cond: (id = 'net/Console/Employee/Day End > Time'::text) > -> Index Scan using sales_tranzdate_index on sales s > (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 > rows=225 loops=1) > Index Cond: ((s.tranzdate >= ('2010-02-15'::date + > (sysstrings.data)::time without time zone)) AND (s.tranzdate < > ('2010-02-16'::date + (sysstrings.data)::time without time zone))) > Filter: ((NOT void) AND (NOT suspended)) The fundamental reason why you're getting a bad plan choice is the factor-of-100 estimation error here. I'm not sure you can do a whole lot about that without rethinking the query --- in particular I would suggest trying to get rid of the non-constant range bounds. You're apparently already plugging in an external variable for the date, so maybe you could handle the time of day similarly instead of joining to sysstrings for it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster
От
Christian Brink
Дата:
On 03/22/2010 03:21 PM, Tom Lane wrote: > The fundamental reason why you're getting a bad plan choice is the > factor-of-100 estimation error here. I'm not sure you can do a whole > lot about that without rethinking the query --- in particular I would > suggest trying to get rid of the non-constant range bounds. You're > apparently already plugging in an external variable for the date, > so maybe you could handle the time of day similarly instead of joining > to sysstrings for it. > > Tom & Peter, I thought you might like to know the outcome of this. I was able to get the 8.0 and the 8.2 planner to correctly run the query. There were 2 issues. As Tom pointed out the the 'systrings' lookup seems to be the main culprit. Which makes sense. How can the planner know how to run the query when it doesn't know approximately what it will bracket the until the query has started? The other part of the solution is bit concerning. I had to do a 'dump and load' (and vacuum analyze) to get the planner to work correctly even after I rewrote the query. FYI I had run 'VACUUM ANALYZE' (and sometimes 'REINDEX TABLE x') between each test. -- Christian Brink