Обсуждение: "not in" clause too slow?

Поиск
Список
Период
Сортировка

"not in" clause too slow?

От
Ottavio Campana
Дата:
mytable has 1857 rows, copy_mytable is a copy of mytable and I want to
know which new rows have been entered. I used the where id not in, and
the query works. My problem is that if I run the same command on another
table with 378415 rows, it is terribly slow. I ran explain analyze on
the first table, just to see how is works, and I have two questions:

1) from explain analyze output, does the planner scan mytable and for
each element runs a sec sqn on copy_mytable?

2) how can I speed it up? by using indexes? or by changing the query?

db=# EXPLAIN ANALYZE select * from mytable where id not in (select id
from copy_mytable);
                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
 Seq Scan on mytable  (cost=53.21..148.34 rows=925 width=96) (actual
time=9.813..9.813 rows=0 loops=1)
   Filter: (NOT (hashed subplan))
   SubPlan
     ->  Seq Scan on copy_mytable  (cost=0.00..48.57 rows=1857 width=4)
(actual time=0.031..3.132 rows=1857 loops=1)
 Total runtime: 10.291 ms


Вложения

Re: "not in" clause too slow?

От
Ottavio Campana
Дата:
Alban Hertroys ha scritto:
> Ottavio Campana wrote:
>> 2) how can I speed it up? by using indexes? or by changing the query?
>
> Do you have indices on mytable.id and copy_mytable.id?
> Does using NOT EXISTS get you any better results?

mytable.id is primary key.

I create copy_mytable with
create table copy_mytable as (select * from mytable);

the planer behavior does not change no matter if I create and index on
copy_mytable.id or not.

>> db=# EXPLAIN ANALYZE select * from mytable where id not in (select id
>> from copy_mytable);
>>                                                       QUERY PLAN
>>
>>
----------------------------------------------------------------------------------------------------------------------
>>  Seq Scan on mytable  (cost=53.21..148.34 rows=925 width=96) (actual
>> time=9.813..9.813 rows=0 loops=1)
>>    Filter: (NOT (hashed subplan))
>>    SubPlan
>>      ->  Seq Scan on copy_mytable  (cost=0.00..48.57 rows=1857 width=4)
>> (actual time=0.031..3.132 rows=1857 loops=1)
>>  Total runtime: 10.291 ms
>>
>
>



Вложения

Re: "not in" clause too slow?

От
"Rodrigo De León"
Дата:
On 9/21/07, Ottavio Campana <ottavio@campana.vi.it> wrote:
> My problem is that if I run the same command on another
> table with 378415 rows, it is terribly slow.

How much is "terribly slow"?

Did you VACUUM ANALYZE?

Anyways, try this:

SELECT * FROM MYTABLE T1
LEFT JOIN COPY_MYTABLE T2
ON T1.ID = T2.ID
WHERE T2.ID IS NULL

Re: "not in" clause too slow?

От
panther-d@freemail.hu
Дата:
Ottavio Campana <ottavio@campana.vi.it> írta:

> mytable.id is primary key.
>
> I create copy_mytable with
> create table copy_mytable as (select * from mytable);
>
> the planer behavior does not change no matter if I create and index
> on copy_mytable.id or not.


Try this, with an index on copy_mytable.id:

SELECT orig.* FROM mytable AS orig WHERE NOT EXISTS (
  SELECT 1 FROM copy_mytable AS copy WHERE copy.id = orig.id
);


Regards,
Denes Daniel

Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en
___________________________________________________
www.t-mobile.hu/mobizin


Re: "not in" clause too slow?

От
Alban Hertroys
Дата:
Ottavio Campana wrote:
> 2) how can I speed it up? by using indexes? or by changing the query?

Do you have indices on mytable.id and copy_mytable.id?
Does using NOT EXISTS get you any better results?

> db=# EXPLAIN ANALYZE select * from mytable where id not in (select id
> from copy_mytable);
>                                                       QUERY PLAN
>
>
----------------------------------------------------------------------------------------------------------------------
>  Seq Scan on mytable  (cost=53.21..148.34 rows=925 width=96) (actual
> time=9.813..9.813 rows=0 loops=1)
>    Filter: (NOT (hashed subplan))
>    SubPlan
>      ->  Seq Scan on copy_mytable  (cost=0.00..48.57 rows=1857 width=4)
> (actual time=0.031..3.132 rows=1857 loops=1)
>  Total runtime: 10.291 ms
>


--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: "not in" clause too slow?

От
Ottavio Campana
Дата:
Alban Hertroys ha scritto:
> Ottavio Campana wrote:
>> 2) how can I speed it up? by using indexes? or by changing the query?
>
> Do you have indices on mytable.id and copy_mytable.id?
> Does using NOT EXISTS get you any better results?

Eventually I had to select not all the table fields but only the primary
key and successively loop on the table again. Thus I've been able to
exploit the indexes.


Вложения

Re: "not in" clause too slow?

От
Martijn van Oosterhout
Дата:
On Fri, Sep 21, 2007 at 12:09:50PM +0200, Ottavio Campana wrote:
> 2) how can I speed it up? by using indexes? or by changing the query?

Note that NOT IN cannot be optimised in the same way as NOT EXISTS due
to the different ways they handle NULL. In particular if the subquery
of the NOT IN produces a NULL *anywhere* it will always return FALSE,
hence it often needs to scan the entire subquery even when an index
might be better. You might know this cannot happen, but postgres can't
always tell. NOT EXISTS doesn't have this problem.

Blame the SQL standard if you like.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения