Обсуждение: improve 'where not exists' query..

Поиск
Список
Период
Сортировка

improve 'where not exists' query..

От
Noel Whelan
Дата:
I'm wondering if there's an ideal way to improve the efficiency of this query:

SELECT i.id FROM items i 
WHERE (NOT (EXISTS (SELECT c.id
FROM contacts c WHERE (c.id = i.id))));

It takes a while to execute, clearly. Thank you,
 
 

Re: improve 'where not exists' query..

От
Bruno Wolff III
Дата:
On Tue, Oct 25, 2005 at 15:46:52 -0500,
  Noel Whelan <noel.whelan@gmail.com> wrote:
> I'm wondering if there's an ideal way to improve the efficiency of this
> query:
>
> SELECT i.id <http://i.id> FROM items i
> WHERE (NOT (EXISTS (SELECT c.id <http://c.id>
> FROM contacts c WHERE (c.id <http://c.id> = i.id <http://i.id>))));
>
> It takes a while to execute, clearly. Thank you,

On recent versions of postgres, NOT IN is potentially faster.

Have you looked at EXPLAIN ANALYZE output to see if anything looks amiss?

Re: improve 'where not exists' query..

От
Noel Whelan
Дата:
I've done an explain analyze - nothing looks wrong to me. I'm thinking it's not exactly an issue with the query itself; it's just an inefficient thing I want to do (essentially, for each id in the one table, identify whether or not one exists in the other table).
 
Current installation is 7.3.4; but I'll look into NOT IN in case that would be an improvement. Thanks,
 
- Noel

 
On 10/26/05, Bruno Wolff III <bruno@wolff.to> wrote:
On Tue, Oct 25, 2005 at 15:46:52 -0500,
Noel Whelan <noel.whelan@gmail.com > wrote:
> I'm wondering if there's an ideal way to improve the efficiency of this
> query:
>
> SELECT i.id <http://i.id> FROM items i
> WHERE (NOT (EXISTS (SELECT c.id <http://c.id>
> FROM contacts c WHERE (c.id <http://c.id> = i.id <http://i.id>))));
>
> It takes a while to execute, clearly. Thank you,

On recent versions of postgres, NOT IN is potentially faster.

Have you looked at EXPLAIN ANALYZE output to see if anything looks amiss?

Re: improve 'where not exists' query..

От
Bruno Wolff III
Дата:
On Wed, Oct 26, 2005 at 12:58:26 -0500,
  Noel Whelan <noel.whelan@gmail.com> wrote:
> I've done an explain analyze - nothing looks wrong to me. I'm thinking it's
> not exactly an issue with the query itself; it's just an inefficient thing I
> want to do (essentially, for each id in the one table, identify whether or
> not one exists in the other table).
>  Current installation is 7.3.4; but I'll look into NOT IN in case that would
> be an improvement. Thanks,
>  - Noel

I am pretty sure the speed up for NOT IN was after 7.3 so that isn't likely to
help. You probably should try the outer join suggestion in that case.

You should probably also seriously consider upgrading. There have been a number
of performance enhancing changes since 7.3.

>
>  On 10/26/05, Bruno Wolff III <bruno@wolff.to> wrote:
> >
> > On Tue, Oct 25, 2005 at 15:46:52 -0500,
> > Noel Whelan <noel.whelan@gmail.com> wrote:
> > > I'm wondering if there's an ideal way to improve the efficiency of this
> > > query:
> > >
> > > SELECT i.id <http://i.id> <http://i.id> FROM items i
> > > WHERE (NOT (EXISTS (SELECT c.id <http://c.id> <http://c.id>
> > > FROM contacts c WHERE (c.id <http://c.id> <http://c.id> = i.id<http://i.id><
> > http://i.id>))));
> > >
> > > It takes a while to execute, clearly. Thank you,
> >
> > On recent versions of postgres, NOT IN is potentially faster.
> >
> > Have you looked at EXPLAIN ANALYZE output to see if anything looks amiss?
> >

Re: improve 'where not exists' query..

От
Richard Huxton
Дата:
Noel Whelan wrote:
> I executed the following:
>
> EXPLAIN ANALYZE SELECT cwit.cempid
> FROM "cwItems" cwit
> WHERE (NOT (EXISTS (SELECT con.cempid
> FROM contacts con
> WHERE (con.cempid = cwit.cempid))));
>
> It comes back with:
>
> Seq Scan on "cwItems" cwit (cost=0.00..8929702.11 rows=5132 width=8) (actual
> time=132218.29..148623.27 rows=31 loops=1)
> Filter: (NOT (subplan))
> SubPlan
> -> Seq Scan on contacts con (cost=0.00..870.77 rows=1 width=8) (actual time=
> 11.82..11.82 rows=1 loops=12528)
> Filter: (cempid = $0)
> Total runtime: 148623.54 msec
>
> I'm not certain it's an issue with the query itself going wrong. I consider
> I'm basically telling it: for each cempid in 'cwItems', check whether or not
> it exists as a cempid in 'contacts', which could be inherently inefficient
> in itself.

Well, as you say it's never going to be possible without checking all
items. However, you've got two seq-scans there and I'd expect a seq-scan
and looping over an index (or a hash-based plan perhaps).

Now, looking at the values it's only seeing 31 rows in cwItems and 1 row
(!) in contacts, so that would explain the scans. However, in that case
I'd expect it to be much faster than it is.

Hmm - I'd take the following steps:

1. VACUUM FULL VERBOSE ANALYSE <table> on both those tables and then see
what happens. Make a note of how many removable/non-removable rows it
finds. Rerun the explain analyse and see what happens.

2. Check that you have an index on contact.cempid and that the types of
cempid match in both tables. Then issue "SET enable_seqscan=true" and
run the explain analyse again - are things faster?

Let us know what happens, oh and don't forget to cc: the list, you were
lucky I read this.

>  I just wondered if there'd be a way to improve on it or not.
>  Installation is postgres-7.3.4.

Upgrade to the latest 7.3.x version as soon as is convenient - lots of
bug fixes to be had.

--
   Richard Huxton
   Archonet Ltd

Re: improve 'where not exists' query..

От
Noel Whelan
Дата:
I've created an index on contacts.cempid (I'd not even checked whether one existed); and the query is certainly improved. I'm wondering..I get the impression that the hash index is ideal in this case, technically, because I only intend to query with '='; but the btree index is a bit faster. I'll need to look further into that..

Thank you for the input on this.
- Noel

On 10/27/05, Richard Huxton <dev@archonet.com> wrote:
Noel Whelan wrote:
> I executed the following:
>
> EXPLAIN ANALYZE SELECT cwit.cempid
> FROM "cwItems" cwit
> WHERE (NOT (EXISTS (SELECT con.cempid
> FROM contacts con
> WHERE ( con.cempid = cwit.cempid))));
>
> It comes back with:
>
> Seq Scan on "cwItems" cwit (cost=0.00..8929702.11 rows=5132 width=8) (actual
> time=132218.29..148623.27 rows=31 loops=1)
> Filter: (NOT (subplan))
> SubPlan
> -> Seq Scan on contacts con (cost=0.00..870.77 rows=1 width=8) (actual time=
> 11.82..11.82 rows=1 loops=12528)
> Filter: (cempid = $0)
> Total runtime: 148623.54 msec
>
> I'm not certain it's an issue with the query itself going wrong. I consider
> I'm basically telling it: for each cempid in 'cwItems', check whether or not
> it exists as a cempid in 'contacts', which could be inherently inefficient
> in itself.

Well, as you say it's never going to be possible without checking all
items. However, you've got two seq-scans there and I'd expect a seq-scan
and looping over an index (or a hash-based plan perhaps).

Now, looking at the values it's only seeing 31 rows in cwItems and 1 row
(!) in contacts, so that would explain the scans. However, in that case
I'd expect it to be much faster than it is.

Hmm - I'd take the following steps:

1. VACUUM FULL VERBOSE ANALYSE <table> on both those tables and then see
what happens. Make a note of how many removable/non-removable rows it
finds. Rerun the explain analyse and see what happens.

2. Check that you have an index on contact.cempid and that the types of
cempid match in both tables. Then issue "SET enable_seqscan=true" and
run the explain analyse again - are things faster?

Let us know what happens, oh and don't forget to cc: the list, you were
lucky I read this.

>  I just wondered if there'd be a way to improve on it or not.
>  Installation is postgres-7.3.4.

Upgrade to the latest 7.3.x version as soon as is convenient - lots of
bug fixes to be had.

--
   Richard Huxton
   Archonet Ltd