Обсуждение: Planner ignoring to use INDEX SCAN

Поиск

Список

Период

Сортировка

Planner ignoring to use INDEX SCAN

От

Ashish Karalkar

Дата:

14 декабря 2007 г., 09:28:31

Hello All,

I have two table:

master_table:
pk_id bigint primary key,
name text

child_table :
chk_id bigint,
pk_id bigint
mob varchar(15)

foreign key pk_id to master_table
index on pk_id of child

query which was taking seconds on the join of these two table suddenly started taking 20/25 min

child table has @ 100M rows

The only deifference was I dropped the foreign key pointing to parent pk_id after this the planner is ignoring to use index scan on child table and doing seq scan.

Is it because I dropped foreign key?

Thanks in advance

With regards
Ashish...

Now you can chat without downloading messenger. Click here to know how.

Re: Planner ignoring to use INDEX SCAN

От

Richard Huxton

Дата:

14 декабря 2007 г., 09:34:25

Ashish Karalkar wrote:
> query which was taking seconds on the join of these two table
> suddenly started taking 20/25 min

Show the EXPLAIN ANALYSE of your problem query and someone will be able
to tell you why.

--
   Richard Huxton
   Archonet Ltd

Re: Planner ignoring to use INDEX SCAN

От

Richard Huxton

Дата:

14 декабря 2007 г., 11:15:45

Ashish Karalkar wrote:
>
> Richard Huxton <dev@archonet.com> wrote: Ashish Karalkar wrote:
>> query which was taking seconds on the join of these two table
>> suddenly started taking 20/25 min
>
> Show the EXPLAIN ANALYSE of your problem query and someone will be able
> to tell you why.
>
> Here is the output from explain analyse:

Actually, this is the output from EXPLAIN not EXPLAIN ANALYSE. It
doesn't show what actually happened, just what the planner thought was
going to happen.

Are the row-estimates roughly accurate?

> table structures are more or less same with delivery being parent and sms_new being child having index on deliveryid
inboth tables. 
>
>  HashAggregate  (cost=6153350.21..6153352.38 rows=174 width=32)
>    ->  Hash Join  (cost=218058.30..6153259.97 rows=6016 width=32)
>          Hash Cond: ("outer".deliveryid = "inner".deliveryid)
>          ->  Seq Scan on sms_new  (cost=0.00..5240444.80 rows=138939341 width=8)
>                Filter: ((otid)::text !~~ 'ERROR%'::text)
>          ->  Hash  (cost=218057.87..218057.87 rows=174 width=32)

Well, it knows that it's going to be expensive (cost=5240444.80). Since
it thinks you'll only get 174 rows from the other side and 6016
matching, I can't see how an index could be calculated as more expensive.

Try issuing ENABLE seq_scan=off and re-running the EXPLAIN, let's see
what cost that comes up with.

Oh, and I take it sms_new is recently vacuumed and analysed?

--
   Richard Huxton
   Archonet Ltd

Re: Planner ignoring to use INDEX SCAN

От

Ashish Karalkar

Дата:

14 декабря 2007 г., 11:41:43

Thanks Richard for your replay,

here is the output..

Richard Huxton <dev@archonet.com> wrote:

Ashish Karalkar wrote:
>
> Richard Huxton wrote: Ashish Karalkar wrote:
>> query which was taking seconds on the join of these two table
>> suddenly started taking 20/25 min
>
> Show the EXPLAIN ANALYSE of your problem query and someone will be able
> to tell you why.
>
> Here is the output from explain analyse:

Actually, this is the output from EXPLAIN not EXPLAIN ANALYSE. It
doesn't show what actually happened, just what the planner thought was
going to happen.

Its from EXPLAIN ANALYSE

Are the row-estimates roughly accurate?

Yes Row count of sms_new is approx. same

> table structures are more or less same with delivery being parent and sms_new being child having index on deliveryid in both tables.
>
> HashAggregate (cost=6153350.21..6153352.38 rows=174 width=32)
> -> Hash Join (cost=218058.30..6153259.97 rows=6016 width=32)
> Hash Cond: ("outer".deliveryid = "inner".deliveryid)
> -> Seq Scan on sms_new (cost=0.00..5240444.80 rows=138939341 width=8)
> Filter: ((otid)::text !~~ 'ERROR%'::text)
> -> Hash (cost=218057.87..218057.87 rows=174 width=32)

Well, it knows that it's going to be expensive (cost=5240444.80). Since
it thinks you'll only get 174 rows from the other side and 6016
matching, I can't see how an index could be calculated as more expensive.

Try issuing ENABLE seq_scan=off and re-running the EXPLAIN, let's see
what cost that comes up with.

here is explain out put after setting enable_seqscan=off

HashAggregate (cost=27729224.21..27729226.21 rows=160 width=32)
   -> Nested Loop (cost=2534.67..27729143.31 rows=5393 width=32)
         -> Bitmap Heap Scan on delivery (cost=2094.41..216143.78 rows=160 width=32)
               Recheck Cond: ((createddate >= '2007-12-10 00:00:00'::timestamp without time zone) AND (createddate <= '2007-12-11 00:00:00'::timestamp without time zone))
               Filter: ((taskid = 14267) AND (((remoteip)::text = '192.168.3.26'::text) OR ((remoteip)::text = '202.162.231.230'::text) OR ((remoteip)::text = '202.162.231.2'::text) OR ((remoteip)::text = '192.168.4.3'::text) OR ((remoteip)::text = '192.168.3.3'::text) OR ((remoteip)::text = '202.162.231.105'::text) OR ((remoteip)::text = '202.162.231.5'::text) OR ((remoteip)::text = '202.162.231.1'::text) OR ((remoteip)::text = '192.168.4.6'::text) OR ((remoteip)::text = '192.168.3.6'::text) OR ((remoteip)::text = '202.162.231.107'::text) OR ((remoteip)::text = '202.162.231.7'::text)))
               -> Bitmap Index Scan on createddate_idx (cost=0.00..2094.41 rows=197068 width=0)
                     Index Cond: ((createddate >= '2007-12-10 00:00:00'::timestamp without time zone) AND (createddate <= '2007-12-11 00:00:00'::timestamp without time zone))
         -> Bitmap Heap Scan on sms_new (cost=440.26..171369.61 rows=46931 width=8)
               Recheck Cond: (sms_new.deliveryid = "outer".deliveryid)
               Filter: ((otid)::text !~~ 'ERROR%'::text)
               -> Bitmap Index Scan on sms_new_deliveryid_idx (cost=0.00..440.26 rows=46931 width=0)
                     Index Cond: (sms_new.deliveryid = "outer".deliveryid)

Oh, and I take it sms_new is recently vacuumed and analysed?

yes it is vacuumed and analysed

--
Richard Huxton
Archonet Ltd

Share files, take polls, and discuss your passions - all under one roof. Click here.

Re: Planner ignoring to use INDEX SCAN

От

Richard Huxton

Дата:

14 декабря 2007 г., 12:01:13

Ashish Karalkar wrote:
> Thanks Richard for your replay,
>
> here is the output..
>
> Richard Huxton <dev@archonet.com> wrote: Ashish Karalkar wrote:
>> Richard Huxton  wrote: Ashish Karalkar wrote:
>>> query which was taking seconds on the join of these two table
>>> suddenly started taking 20/25 min
>> Show the EXPLAIN ANALYSE of your problem query and someone will be able
>> to tell you why.
>>
>> Here is the output from explain analyse:
>
> Actually, this is the output from EXPLAIN not EXPLAIN ANALYSE. It
> doesn't show what actually happened, just what the planner thought was
> going to happen.
>
> Its from EXPLAIN ANALYSE

No it's not, because that shows an extra set of figures. If you see
below there is a cost for each stage and an actual time too.

EXPLAIN ANALYSE SELECT d.id, l.name FROM items.documents d JOIN
lookups.document_class l ON d.class=l.id;
                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.16..14.00 rows=162 width=19) (actual
time=0.100..0.392 rows=162 loops=1)
    Hash Cond: ((d.class)::text = (l.id)::text)
    ->  Seq Scan on documents d  (cost=0.00..10.62 rows=162 width=9)
(actual time=0.024..0.121 rows=162 loops=1)
    ->  Hash  (cost=1.07..1.07 rows=7 width=20) (actual
time=0.029..0.029 rows=8 loops=1)
          ->  Seq Scan on document_class l  (cost=0.00..1.07 rows=7
width=20) (actual time=0.008..0.015 rows=8 loops=1)
  Total runtime: 0.506 ms
(6 rows)

> Are the row-estimates roughly accurate?
>
> Yes Row count of sms_new is approx. same

OK, that's good.

>> table structures are more or less same with delivery being parent and sms_new being child having index on deliveryid
inboth tables. 
>>
>>  HashAggregate  (cost=6153350.21..6153352.38 rows=174 width=32)
>>    ->  Hash Join  (cost=218058.30..6153259.97 rows=6016 width=32)
>>          Hash Cond: ("outer".deliveryid = "inner".deliveryid)
>>          ->  Seq Scan on sms_new  (cost=0.00..5240444.80 rows=138939341 width=8)
>>                Filter: ((otid)::text !~~ 'ERROR%'::text)
>>          ->  Hash  (cost=218057.87..218057.87 rows=174 width=32)
>
> Well, it knows that it's going to be expensive (cost=5240444.80). Since
> it thinks you'll only get 174 rows from the other side and 6016
> matching, I can't see how an index could be calculated as more expensive.
>
> Try issuing ENABLE seq_scan=off and re-running the EXPLAIN, let's see
> what cost that comes up with.
>
> here is explain out put after setting enable_seqscan=off
>
>  HashAggregate  (cost=27729224.21..27729226.21 rows=160 width=32)
>    ->  Nested Loop  (cost=2534.67..27729143.31 rows=5393 width=32)
>          ->  Bitmap Heap Scan on delivery  (cost=2094.41..216143.78 rows=160 width=32)

>          ->  Bitmap Heap Scan on sms_new  (cost=440.26..171369.61 rows=46931 width=8)

Well, the estimated cost for this one is up to 27 million from the
previous 6 million. It's doing two bitmap scans and then the nested loop
which is what's pushing the cost up.

Can you post the query too?

--
   Richard Huxton
   Archonet Ltd

Re: Planner ignoring to use INDEX SCAN

От

Gregory Stark

Дата:

14 декабря 2007 г., 12:30:53

"Ashish Karalkar" <ashish_postgre@yahoo.co.in> writes:

> Thanks Richard for your replay,
>
> Richard Huxton <dev@archonet.com> wrote:
>>
> Actually, this is the output from EXPLAIN not EXPLAIN ANALYSE. It
> doesn't show what actually happened, just what the planner thought was
> going to happen.
>
> Its from EXPLAIN ANALYSE

No, it's not. Not unless you've removed all the added info explain analyse
adds to the explain output.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: Planner ignoring to use INDEX SCAN

От

Ashish Karalkar

Дата:

14 декабря 2007 г., 15:26:34

Richard Huxton <dev@archonet.com> wrote:

Ashish Karalkar wrote:
> query which was taking seconds on the join of these two table
> suddenly started taking 20/25 min

Show the EXPLAIN ANALYSE of your problem query and someone will be able
to tell you why.

Here is the output from explain analyse:
table structures are more or less same with delivery being parent and sms_new being child having index on deliveryid in both tables.

HashAggregate (cost=6153350.21..6153352.38 rows=174 width=32)
   -> Hash Join (cost=218058.30..6153259.97 rows=6016 width=32)
         Hash Cond: ("outer".deliveryid = "inner".deliveryid)
         -> Seq Scan on sms_new (cost=0.00..5240444.80 rows=138939341 width=8)
               Filter: ((otid)::text !~~ 'ERROR%'::text)
         -> Hash (cost=218057.87..218057.87 rows=174 width=32)
               -> Bitmap Heap Scan on delivery (cost=2218.02..218057.87 rows=174 width=32)
                     Recheck Cond: ((createddate >= '2007-12-10 00:00:00'::timestamp without time zone) AND (createddate <= '2007-12-11 00:00:00'::timestamp without time zone))
                     Filter: ((taskid = 14267) AND (((remoteip)::text = '192.168.3.26'::text) OR ((remoteip)::text = '202.162.231.230'::text) OR ((remoteip)::text = '202.162.231.2'::text) OR ((remoteip)::text = '192.168.4.3'::text) OR ((remoteip)::text = '192.168.3.3'::text) OR ((remoteip)::text = '202.162.231.105'::text) OR ((remoteip)::text = '202.162.231.5'::text) OR ((remoteip)::text = '202.162.231.1'::text) OR ((remoteip)::text = '192.168.4.6'::text) OR ((remoteip)::text = '192.168.3.6'::text) OR ((remoteip)::text = '202.162.231.107'::text) OR ((remoteip)::text = '202.162.231.7'::text)))
                     -> Bitmap Index Scan on createddate_idx (cost=0.00..2218.02 rows=207004 width=0)
                           Index Cond: ((createddate >= '2007-12-10 00:00:00'::timestamp without time zone) AND (createddate <= '2007-12-11 00:00:00'::timestamp without time zone))

Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Get the freedom to save as many mails as you wish. Click here to know how.

Re: Planner ignoring to use INDEX SCAN

От

Will

Дата:

17 декабря 2007 г., 15:07:20

Hi - been having a few problems like this myself. It's probably a
locale thing.

Here's at one of your problems:

         ->  Seq Scan on sms_new  (cost=0.00..5240444.80
rows=138939341 width=8)
               Filter: ((otid)::text !~~ 'ERROR%'::text)

I assume you have an index on sms_new?

Check the locale your database cluster has been set up with:

     SHOW lc_ctype

if it doesn't come back with "C" as your locale, then you will have to
make sure your indexes are created as follows:

CREATE INDEX IDX_sms_new
  ON YOUR_TABLE
  USING btree
  (sms_new varchar_pattern_ops);

Look at chapter 11.8 - Operator Classes for an explanation.

Unfortunately you'll need a second index if you want to use a non-
pattern matching operator, e.g. =.

If your locale is C, then I don't know what the problem is.

Hope that helps.

Will Temperley

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Planner ignoring to use INDEX SCAN