Обсуждение: subquery/alias question

Поиск
Список
Период
Сортировка

subquery/alias question

От
Madison Kelly
Дата:
Hi all,

   I've read 7.2.1.3 (as short as it is) in the PgSQL docs, but don't
see what I am doing wrong... Maybe you can help?

   I've got a query;

SELECT
    d.dom_id,
    d.dom_name,
    (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
    AS
    usr_count
FROM
    domains d
ORDER BY d.dom_name ASC;

   Where 'usr_count' returns the number of entries in 'users' that point
to a given entry in 'domains'. Pretty straight forward so far. The
trouble is:

SELECT
    d.dom_id,
    d.dom_name,
    (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
    AS
    usr_count
FROM
    domains d
WHERE
    usr_count > 0
ORDER BY d.dom_name ASC;

   Causes the error:

ERROR:  column "usr_count" does not exist

   It works if I use:

SELECT
    d.dom_id,
    d.dom_name,
    (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
    AS
    usr_count
FROM
    domains d
WHERE
    (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
ORDER BY d.dom_name ASC;

   This seems terribly inefficient (and ugly), and I can't see why the
results from 'usr_count' can't be counted... I can use 'usr_count' to
sort the results...

   Thanks all!

Madi

Re: subquery/alias question

От
Michael Glaesemann
Дата:
On Sep 25, 2007, at 16:59 , Madison Kelly wrote:

> SELECT
>     d.dom_id,
>     d.dom_name,
>     (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
>     AS
>     usr_count
> FROM
>     domains d
> WHERE
>     (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
> ORDER BY d.dom_name ASC;

Why not just use a join? Something like this would work, I should think:

select dom_id,
        dom_name,
        usr_count
   from domains
   natural join (select usr_dom_id as dom_id,
                        count(usr_dom_id) as usr_count
                   from users) u
   where usr_count > 0
   order by dom_name;

Michael Glaesemann
grzm seespotcode net



Re: subquery/alias question

От
Alvaro Herrera
Дата:
Michael Glaesemann wrote:
>
> On Sep 25, 2007, at 16:59 , Madison Kelly wrote:
>
>> SELECT
>>     d.dom_id,
>>     d.dom_name,
>>     (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
>>     AS
>>     usr_count
>> FROM
>>     domains d
>> WHERE
>>     (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
>> ORDER BY d.dom_name ASC;
>
> Why not just use a join? Something like this would work, I should think:
>
> select dom_id,
>        dom_name,
>        usr_count
>   from domains
>   natural join (select usr_dom_id as dom_id,
>                        count(usr_dom_id) as usr_count
>                   from users) u
>   where usr_count > 0
>   order by dom_name;

Maybe the usr_count should be tested in a HAVING clause instead of
WHERE?  And put the count(*) in the result list instead of a subselect.
That feels more natural to me anyway.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: subquery/alias question

От
Michael Glaesemann
Дата:
On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote:

> Michael Glaesemann wrote:
>>
>> select dom_id,
>>        dom_name,
>>        usr_count
>>   from domains
>>   natural join (select usr_dom_id as dom_id,
>>                        count(usr_dom_id) as usr_count
>>                   from users) u
>>   where usr_count > 0
>>   order by dom_name;
>
> Maybe the usr_count should be tested in a HAVING clause instead of
> WHERE?  And put the count(*) in the result list instead of a
> subselect.
> That feels more natural to me anyway.

I believe you'd have to write it like

select dom_id, dom_name, count(usr_dom_id) as usr_count
   from domains
   join users on (usr_dom_id = dom_id)
   having count(usr_dom_id) > 0
   order by dom_name;

I don't know how the performance would compare. I think the backend
is smart enough to know it doesn't need to perform two seq scans to
calculate count(usr_dom_id), but I wasn't sure.

Madison, how do the two queries compare with explain analyze?

Michael Glaesemann
grzm seespotcode net



Re: subquery/alias question

От
Tom Lane
Дата:
Michael Glaesemann <grzm@seespotcode.net> writes:
> I believe you'd have to write it like

> select dom_id, dom_name, count(usr_dom_id) as usr_count
>    from domains
>    join users on (usr_dom_id = dom_id)
>    having count(usr_dom_id) > 0
>    order by dom_name;

> I don't know how the performance would compare. I think the backend
> is smart enough to know it doesn't need to perform two seq scans to
> calculate count(usr_dom_id), but I wasn't sure.

It has been smart enough for a few years now --- don't recall when
exactly, but nodeAgg.c quoth

     * Perform lookups of aggregate function info, and initialize the
     * unchanging fields of the per-agg data.  We also detect duplicate
     * aggregates (for example, "SELECT sum(x) ... HAVING sum(x) > 0"). When
     * duplicates are detected, we only make an AggStatePerAgg struct for the
     * first one.  The clones are simply pointed at the same result entry by
     * giving them duplicate aggno values.

... which in English means we just do the calculation once ...

            regards, tom lane

Re: subquery/alias question

От
Michael Glaesemann
Дата:
On Sep 25, 2007, at 21:44 , Tom Lane wrote:

> ... which in English means we just do the calculation once ...

As always, thanks, Tom, for the explanation (and Alvaro, who probably
already knew this :))

Michael Glaesemann
grzm seespotcode net



Re: subquery/alias question

От
Madison Kelly
Дата:
Michael Glaesemann wrote:
>
> On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote:
>
>> Michael Glaesemann wrote:
>>>
>>> select dom_id,
>>>        dom_name,
>>>        usr_count
>>>   from domains
>>>   natural join (select usr_dom_id as dom_id,
>>>                        count(usr_dom_id) as usr_count
>>>                   from users) u
>>>   where usr_count > 0
>>>   order by dom_name;
>>
>> Maybe the usr_count should be tested in a HAVING clause instead of
>> WHERE?  And put the count(*) in the result list instead of a subselect.
>> That feels more natural to me anyway.
>
> I believe you'd have to write it like
>
> select dom_id, dom_name, count(usr_dom_id) as usr_count
>   from domains
>   join users on (usr_dom_id = dom_id)
>   having count(usr_dom_id) > 0
>   order by dom_name;
>
> I don't know how the performance would compare. I think the backend is
> smart enough to know it doesn't need to perform two seq scans to
> calculate count(usr_dom_id), but I wasn't sure.
>
> Madison, how do the two queries compare with explain analyze?

Thanks for your reply!

   Unfortunately, in both cases I get the error:

nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM
domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0
ORDER BY dom_name;
ERROR:  syntax error at or near "COUNT" at character 25
LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ...

   I've been struggling with some deadlines, so for now I'm using just:

SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM
users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;

   Which gives me just the domains with at least one user under them,
but not the count. This is not ideal, and I will have to come back to it
next week. In the meantime, any idea what the GROUP BY error is? If not,
I'll read through the docs on 'GROUP'ing once I get this deadline out of
the way.

   Thank you all for your help! I am sure I will have more question(s)
next week as soon as I can get back to this.

Madi

Re: subquery/alias question

От
Alvaro Herrera
Дата:
Madison Kelly wrote:

> Thanks for your reply!
>
>   Unfortunately, in both cases I get the error:
>
> nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains
> JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY
> dom_name;
> ERROR:  syntax error at or near "COUNT" at character 25
> LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ...

Try to avoid missing the comma before the COUNT (and do not cheat when
cut'n pasting ...)

Also it seems you will need a GROUP BY clause:
GROUP BY dom_id, dom_name
(placed just before the HAVING clause).

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Some men are heterosexual, and some are bisexual, and some
men don't think about sex at all... they become lawyers" (Woody Allen)

Re: subquery/alias question

От
Gregory Stark
Дата:
"Madison Kelly" <linux@alteeve.com> writes:

> SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u
> WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
>
>   Which gives me just the domains with at least one user under them, but not
> the count. This is not ideal, and I will have to come back to it next week. In
> the meantime, any idea what the GROUP BY error is? If not, I'll read through
> the docs on 'GROUP'ing once I get this deadline out of the way.

I think you just want simply:

SELECT dom_id, dom_name, count(*)
  FROM users
  JOIN domains ON (usr_dom_id=dom_id)
 GROUP BY dom_id, dom_nmae
 ORDER BY dom_name

You don't actually need the HAVING (though it wouldn't do any harm either)
since only domains which match a user will come out of the join anyways.

You can also write it using a subquery instead of a join

SELECT *
  FROM (
        SELECT dom_id, dom_name,
               (SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers
          FROM domains
       ) as subq
 WHERE nusers > 0
 ORDER BY dom_name

But that will perform worse in many cases.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Solved! Was: (subquery/alias question)

От
Madison Kelly
Дата:
Alvaro Herrera wrote:
> Madison Kelly wrote:
>
>> Thanks for your reply!
>>
>>   Unfortunately, in both cases I get the error:
>>
>> nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains
>> JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY
>> dom_name;
>> ERROR:  syntax error at or near "COUNT" at character 25
>> LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ...
>
> Try to avoid missing the comma before the COUNT (and do not cheat when
> cut'n pasting ...)
>
> Also it seems you will need a GROUP BY clause:
> GROUP BY dom_id, dom_name
> (placed just before the HAVING clause).

Bingo!

Now to answer the performance questions (using my actual queries,
unedited so they are a little longer):

-=-=-=-=-=-
nmc=> EXPLAIN ANALYZE SELECT dom_id, dom_name, dom_note,
COUNT(usr_dom_id) AS usr_count FROM domains JOIN users ON
(usr_dom_id=dom_id) GROUP BY dom_id, dom_name, dom_note HAVING COUNT
(usr_dom_id) > 0 ORDER BY dom_name;
                                                        QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=10.70..10.78 rows=31 width=72) (actual time=2.107..2.133
rows=17 loops=1)
    Sort Key: domains.dom_name
    ->  HashAggregate  (cost=9.39..9.93 rows=31 width=72) (actual
time=1.899..1.956 rows=17 loops=1)
          Filter: (count(usr_dom_id) > 0)
          ->  Hash Join  (cost=7.20..9.00 rows=31 width=72) (actual
time=0.942..1.411 rows=96 loops=1)
                Hash Cond: ("outer".dom_id = "inner".usr_dom_id)
                ->  Seq Scan on domains  (cost=0.00..1.31 rows=31
width=68) (actual time=0.227..0.321 rows=31 loops=1)
                ->  Hash  (cost=6.96..6.96 rows=96 width=4) (actual
time=0.673..0.673 rows=96 loops=1)
                      ->  Seq Scan on users  (cost=0.00..6.96 rows=96
width=4) (actual time=0.010..0.371 rows=96 loops=1)
  Total runtime: 2.454 ms
(10 rows)
-=-=-=-=-=-

   Versus:

-=-=-=-=-=-
nmc=> EXPLAIN ANALYZE SELECT d.dom_id, d.dom_name, d.dom_note, (SELECT
COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) AS usr_count FROM
domains d WHERE (SELECT COUNT(*) FROM users u WHERE
u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
                                                       QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
  Sort  (cost=297.37..297.39 rows=10 width=68) (actual
time=10.171..10.196 rows=17 loops=1)
    Sort Key: dom_name
    ->  Seq Scan on domains d  (cost=0.00..297.20 rows=10 width=68)
(actual time=0.508..10.013 rows=17 loops=1)
          Filter: ((subplan) > 0)
          SubPlan
            ->  Aggregate  (cost=7.21..7.21 rows=1 width=0) (actual
time=0.203..0.204 rows=1 loops=31)
                  ->  Seq Scan on users u  (cost=0.00..7.20 rows=1
width=0) (actual time=0.127..0.189 rows=3 loops=31)
                        Filter: (usr_dom_id = $0)
            ->  Aggregate  (cost=7.21..7.21 rows=1 width=0) (actual
time=0.184..0.186 rows=1 loops=17)
                  ->  Seq Scan on users u  (cost=0.00..7.20 rows=1
width=0) (actual time=0.058..0.164 rows=6 loops=17)
                        Filter: (usr_dom_id = $0)
  Total runtime: 10.593 ms
(12 rows)
-=-=-=-=-=-

   So using the JOIN you all helped me with, the query returns in 2.454
ms compared to my early query of 10.593 ms!

   I have not yet looked into any indexing either. I am waiting until
the program is done and then will go back and review queries to look for
bottlenecks.

   Thanks to all of you!!

Madi

Re: subquery/alias question

От
Michael Glaesemann
Дата:
On Sep 26, 2007, at 7:41 , Madison Kelly wrote:

>   Unfortunately, in both cases I get the error:

Um, the two cases could not be giving the same error as they don't
both contain the syntax that the  error is complaining about: the
first case uses count in a subquery so it couldn't throw this exact
error.

> nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM
> domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id)
> > 0 ORDER BY dom_name;
> ERROR:  syntax error at or near "COUNT" at character 25
> LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count
> FROM ...

The error message doesn't match the query you've provided. Note that
in the line marked LINE 1, there's no comma after dom_name, which I
assume is what the server is complaining about. However, the query
you show *does* have this comma. Something isn't right. Is this an
exact copy and paste from psql?

>   I've been struggling with some deadlines, so for now I'm using just:
>
> SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*)
> FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
>
>   Which gives me just the domains with at least one user under
> them, but not the count. This is not ideal, and I will have to come
> back to it next week. In the meantime, any idea what the GROUP BY
> error is?

Ah. You haven't actually shown us a GROUP BY error. A GROUP BY clause
is needed when you've got columns that aren't included in the
aggregate (COUNT in this case), e.g.,

select dom_id,
        dom_name,
        usr_count
   from domains
   natural join (select usr_dom_id as dom_id,
                        count(usr_dom_id) as usr_count
                   from users
                   group by dom_id) u
   where usr_count > 0
   order by dom_name;

select dom_id, dom_name, count(usr_dom_id) as usr_count
   from domains
   join users on (usr_dom_id = dom_id)
   group by dom_id, dom_name
   having count(usr_dom_id) > 0
   order by dom_name;

Michael Glaesemann
grzm seespotcode net



Re: subquery/alias question

От
Madison Kelly
Дата:
Gregory Stark wrote:
> "Madison Kelly" <linux@alteeve.com> writes:
>
>> SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u
>> WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
>>
>>   Which gives me just the domains with at least one user under them, but not
>> the count. This is not ideal, and I will have to come back to it next week. In
>> the meantime, any idea what the GROUP BY error is? If not, I'll read through
>> the docs on 'GROUP'ing once I get this deadline out of the way.
>
> I think you just want simply:
>
> SELECT dom_id, dom_name, count(*)
>   FROM users
>   JOIN domains ON (usr_dom_id=dom_id)
>  GROUP BY dom_id, dom_nmae
>  ORDER BY dom_name
>
> You don't actually need the HAVING (though it wouldn't do any harm either)
> since only domains which match a user will come out of the join anyways.
>
> You can also write it using a subquery instead of a join
>
> SELECT *
>   FROM (
>         SELECT dom_id, dom_name,
>                (SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers
>           FROM domains
>        ) as subq
>  WHERE nusers > 0
>  ORDER BY dom_name
>
> But that will perform worse in many cases.
>

You are right, the 'HAVING' clause does seem to be redundant. I removed
it and ran several 'EXPLAIN ANALYZE's on it with and without the
'HAVING' clause and found no perceivable difference. I removed the
'HAVING' clause anyway, since I like to keep queries as minimal as possible.

Thank you!

Madi