Обсуждение: "not in" clause too slow?
mytable has 1857 rows, copy_mytable is a copy of mytable and I want to know which new rows have been entered. I used the where id not in, and the query works. My problem is that if I run the same command on another table with 378415 rows, it is terribly slow. I ran explain analyze on the first table, just to see how is works, and I have two questions: 1) from explain analyze output, does the planner scan mytable and for each element runs a sec sqn on copy_mytable? 2) how can I speed it up? by using indexes? or by changing the query? db=# EXPLAIN ANALYZE select * from mytable where id not in (select id from copy_mytable); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Seq Scan on mytable (cost=53.21..148.34 rows=925 width=96) (actual time=9.813..9.813 rows=0 loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on copy_mytable (cost=0.00..48.57 rows=1857 width=4) (actual time=0.031..3.132 rows=1857 loops=1) Total runtime: 10.291 ms
Вложения
Alban Hertroys ha scritto: > Ottavio Campana wrote: >> 2) how can I speed it up? by using indexes? or by changing the query? > > Do you have indices on mytable.id and copy_mytable.id? > Does using NOT EXISTS get you any better results? mytable.id is primary key. I create copy_mytable with create table copy_mytable as (select * from mytable); the planer behavior does not change no matter if I create and index on copy_mytable.id or not. >> db=# EXPLAIN ANALYZE select * from mytable where id not in (select id >> from copy_mytable); >> QUERY PLAN >> >> ---------------------------------------------------------------------------------------------------------------------- >> Seq Scan on mytable (cost=53.21..148.34 rows=925 width=96) (actual >> time=9.813..9.813 rows=0 loops=1) >> Filter: (NOT (hashed subplan)) >> SubPlan >> -> Seq Scan on copy_mytable (cost=0.00..48.57 rows=1857 width=4) >> (actual time=0.031..3.132 rows=1857 loops=1) >> Total runtime: 10.291 ms >> > >
Вложения
On 9/21/07, Ottavio Campana <ottavio@campana.vi.it> wrote: > My problem is that if I run the same command on another > table with 378415 rows, it is terribly slow. How much is "terribly slow"? Did you VACUUM ANALYZE? Anyways, try this: SELECT * FROM MYTABLE T1 LEFT JOIN COPY_MYTABLE T2 ON T1.ID = T2.ID WHERE T2.ID IS NULL
Ottavio Campana <ottavio@campana.vi.it> írta: > mytable.id is primary key. > > I create copy_mytable with > create table copy_mytable as (select * from mytable); > > the planer behavior does not change no matter if I create and index > on copy_mytable.id or not. Try this, with an index on copy_mytable.id: SELECT orig.* FROM mytable AS orig WHERE NOT EXISTS ( SELECT 1 FROM copy_mytable AS copy WHERE copy.id = orig.id ); Regards, Denes Daniel Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en ___________________________________________________ www.t-mobile.hu/mobizin
Ottavio Campana wrote: > 2) how can I speed it up? by using indexes? or by changing the query? Do you have indices on mytable.id and copy_mytable.id? Does using NOT EXISTS get you any better results? > db=# EXPLAIN ANALYZE select * from mytable where id not in (select id > from copy_mytable); > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------- > Seq Scan on mytable (cost=53.21..148.34 rows=925 width=96) (actual > time=9.813..9.813 rows=0 loops=1) > Filter: (NOT (hashed subplan)) > SubPlan > -> Seq Scan on copy_mytable (cost=0.00..48.57 rows=1857 width=4) > (actual time=0.031..3.132 rows=1857 loops=1) > Total runtime: 10.291 ms > -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys ha scritto: > Ottavio Campana wrote: >> 2) how can I speed it up? by using indexes? or by changing the query? > > Do you have indices on mytable.id and copy_mytable.id? > Does using NOT EXISTS get you any better results? Eventually I had to select not all the table fields but only the primary key and successively loop on the table again. Thus I've been able to exploit the indexes.
Вложения
On Fri, Sep 21, 2007 at 12:09:50PM +0200, Ottavio Campana wrote: > 2) how can I speed it up? by using indexes? or by changing the query? Note that NOT IN cannot be optimised in the same way as NOT EXISTS due to the different ways they handle NULL. In particular if the subquery of the NOT IN produces a NULL *anywhere* it will always return FALSE, hence it often needs to scan the entire subquery even when an index might be better. You might know this cannot happen, but postgres can't always tell. NOT EXISTS doesn't have this problem. Blame the SQL standard if you like. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.