Обсуждение: Optimising Union Query.

Поиск
Список
Период
Сортировка

Optimising Union Query.

От
Rob Kirkbride
Дата:
Hi,

I've got a query that takes quite some time to complete. I'm not an SQL
expert so I'm not sure how to improve things.
I've done a explain analyze and as I expected the database has to check
every row in each of the three tables below but I'm wondering if I can
do it much quicker by a use of an index or something. Each of the three
tables could have several thousand entries in. Basically the tables
contain data recorded against time then every hour a script deletes
entries that more than so many hours old.

select l.name,l.id from pa i,locations l where i.location=l.id union
select l.name,l.id from andu i,locations l where i.location=l.id union
select l.name,l.id from idu i,locations l where i.location=l.id;

Thanks for any help,

Rob



Re: Optimising Union Query.

От
Dawid Kuroczko
Дата:
On 4/22/05, Rob Kirkbride <rob.kirkbride@thales-is.com> wrote:
> I've got a query that takes quite some time to complete. I'm not an SQL
> expert so I'm not sure how to improve things.
> I've done a explain analyze and as I expected the database has to check
> every row in each of the three tables below but I'm wondering if I can
> do it much quicker by a use of an index or something. Each of the three
> tables could have several thousand entries in. Basically the tables
> contain data recorded against time then every hour a script deletes
> entries that more than so many hours old.
>
> select l.name,l.id from pa i,locations l where i.location=l.id union
> select l.name,l.id from andu i,locations l where i.location=l.id union
> select l.name,l.id from idu i,locations l where i.location=l.id;

Would it be OK if there were duplicates returned?  I.e if select from pa
table and andu table returned same row, would it be ok if there would
be two rows in "final" output because of one?

If so, change "union" to "UNION ALL".

If you put only "UNION", server gets resutls from _all_ selects, removes
duplicates and returns your query.  If you put "UNION ALL" it simply
does three selects and returns all the rows returned.  Not having to
look for duplicates makes it WAY faster.

  Regards,
     Dawid

Re: Optimising Union Query.

От
ptjm@interlog.com (Patrick TJ McPhee)
Дата:
In article <4268F322.1040106@thales-is.com>,
Rob Kirkbride <rob.kirkbride@thales-is.com> wrote:

% I've done a explain analyze and as I expected the database has to check
% every row in each of the three tables below but I'm wondering if I can

This is because you're returning a row for every row in the three
tables.

% select l.name,l.id from pa i,locations l where i.location=l.id union
% select l.name,l.id from andu i,locations l where i.location=l.id union
% select l.name,l.id from idu i,locations l where i.location=l.id;

You might get some improvement from

 select name,id from locations
  where id in (select distinct location from pa union
               select distinct location from andu union
               select distinct location from idu);

this query might be helped by an index on location in each of those
three tables, but probably not.
--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com

Re: Optimising Union Query.

От
"Jim C. Nasby"
Дата:
On Sat, Apr 23, 2005 at 10:39:14PM +0000, Patrick TJ McPhee wrote:
> In article <4268F322.1040106@thales-is.com>,
> Rob Kirkbride <rob.kirkbride@thales-is.com> wrote:
>
> % I've done a explain analyze and as I expected the database has to check
> % every row in each of the three tables below but I'm wondering if I can
>
> This is because you're returning a row for every row in the three
> tables.
>
> % select l.name,l.id from pa i,locations l where i.location=l.id union
> % select l.name,l.id from andu i,locations l where i.location=l.id union
> % select l.name,l.id from idu i,locations l where i.location=l.id;
>
> You might get some improvement from
>
>  select name,id from locations
>   where id in (select distinct location from pa union
>                select distinct location from andu union
>                select distinct location from idu);

Note that SELECT DISTINCT is redundant with a plain UNION. By
definition, UNION does a DISTINCT. In fact, this is going to hurt you;
you'll end up doing 4 distinct operations (one for each SELECT DISTINCT
and one for the overall UNION). Unless some of those tables have a lot
of duplicated location values, you should either use UNION ALLs or drop
the DISTINCTs. Note that going with DISTINCTs is different than what
your original query does.

You should also consider this:

SELECT name, id FROM locations l
    WHERE EXISTS (SELECT * FROM pa p WHERE p.location=l.id)
        OR EXISTS (SELECT * FROM andu a WHERE a.location=l.id)
        OR EXISTS (SELECT * FROM idu i WHERE i.location=l.id)

This query would definately be helped by having indexes on
(pa|andu|idu).location.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

question about about future 8.1 and IN, INOUT, and OUT parameters

От
Tony Caduto
Дата:
I saw that the recently released pl/java has the ablity to use IN, INOUT, and
OUT parameters.

Is the syntax going to be exactly like it is in Oracle?

Re: Optimising Union Query.

От
Rob Kirkbride
Дата:
Jim C. Nasby wrote on 25/04/2005 01:28:

>On Sat, Apr 23, 2005 at 10:39:14PM +0000, Patrick TJ McPhee wrote:
>
>
>>In article <4268F322.1040106@thales-is.com>,
>>Rob Kirkbride <rob.kirkbride@thales-is.com> wrote:
>>
>>% I've done a explain analyze and as I expected the database has to check
>>% every row in each of the three tables below but I'm wondering if I can
>>
>>This is because you're returning a row for every row in the three
>>tables.
>>
>>% select l.name,l.id from pa i,locations l where i.location=l.id union
>>% select l.name,l.id from andu i,locations l where i.location=l.id union
>>% select l.name,l.id from idu i,locations l where i.location=l.id;
>>
>>You might get some improvement from
>>
>> select name,id from locations
>>  where id in (select distinct location from pa union
>>               select distinct location from andu union
>>               select distinct location from idu);
>>
>>
>
>Note that SELECT DISTINCT is redundant with a plain UNION. By
>definition, UNION does a DISTINCT. In fact, this is going to hurt you;
>you'll end up doing 4 distinct operations (one for each SELECT DISTINCT
>and one for the overall UNION). Unless some of those tables have a lot
>of duplicated location values, you should either use UNION ALLs or drop
>the DISTINCTs. Note that going with DISTINCTs is different than what
>your original query does.
>
>You should also consider this:
>
>SELECT name, id FROM locations l
>    WHERE EXISTS (SELECT * FROM pa p WHERE p.location=l.id)
>        OR EXISTS (SELECT * FROM andu a WHERE a.location=l.id)
>        OR EXISTS (SELECT * FROM idu i WHERE i.location=l.id)
>
>This query would definately be helped by having indexes on
>(pa|andu|idu).location.
>
>

Thanks for that. I tried a few things, including using DISTINCTS and
UNION ALLs but none made a big difference.
However  your query above sped things up by a factor of more than 2.

Thanks very much!

Rob


Re: question about about future 8.1 and IN, INOUT, and OUT parameters

От
Michael Fuhr
Дата:
On Sun, Apr 24, 2005 at 11:05:57PM -0500, Tony Caduto wrote:
>
> I saw that the recently released pl/java has the ablity to use IN, INOUT, and
> OUT parameters.
>
> Is the syntax going to be exactly like it is in Oracle?

I'm not familiar with the Oracle syntax and I don't know how close
the PostgreSQL implementation is to being final, but here's an
example that works with the most recent code from HEAD (8.1devel):

CREATE FUNCTION foo(IN x integer, INOUT y integer, OUT z integer) AS $$
BEGIN
    y := y + 5;
    z := x + 5;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

SELECT foo(10, 20);
   foo
---------
 (25,15)
(1 row)

SELECT (foo(10, 20)).*;
 y  | z
----+----
 25 | 15
(1 row)

SELECT (foo).* FROM (SELECT foo(10, 20)) AS s;
 y  | z
----+----
 25 | 15
(1 row)

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: question about about future 8.1 and IN, INOUT, and OUT parameters

От
Tom Lane
Дата:
Michael Fuhr <mike@fuhr.org> writes:
> On Sun, Apr 24, 2005 at 11:05:57PM -0500, Tony Caduto wrote:
>> Is the syntax going to be exactly like it is in Oracle?

> I'm not familiar with the Oracle syntax and I don't know how close
> the PostgreSQL implementation is to being final, but here's an
> example that works with the most recent code from HEAD (8.1devel):

> CREATE FUNCTION foo(IN x integer, INOUT y integer, OUT z integer) AS $$

FWIW, we will also take the more Oracle-ish spelling of the parameter
list:

CREATE FUNCTION foo(x IN integer, y IN OUT integer, z OUT integer) AS ...

... although given all the other in-detail discrepancies between plpgsql
and Oracle's pl/sql, I'm not sure how much this will really make
anyone's life easier.  In any case, Michael's example is the preferred
syntax because it is what the SQL spec calls for.

            regards, tom lane