Обсуждение: Window function trouble

Поиск
Список
Период
Сортировка

Window function trouble

От
Harald Fuchs
Дата:
I have a table like this:
 CREATE TABLE tbl (   host text NOT NULL,   adr ip4 NOT NULL,   usr text NOT NULL );

(ip4 is from the ip4r contrib module)
and I want the number of entries per address and per user:
 SELECT adr, usr, count(*) FROM tbl WHERE host = ?   AND adr <<= ? GROUP BY adr, usr ORDER BY adr, usr

That's pretty basic stuff and returns something like this:
 adr1 usr1_1 cnt1_1 adr1 usr1_2 cnt1_2 adr1 usr1_3 cnt1_3 adr2 usr2_1 cnt2_1 ...

But I want the address to be NULL if it's the same as the address of
the previous row. I came up with this:
 SELECT CASE lag(adr) OVER (ORDER BY adr)        WHEN adr THEN NULL        ELSE adr        END AS myaddr,        usr,
count(*)FROM tbl WHERE host = ?   AND adr <<= ? GROUP BY adr, usr ORDER BY adr, usr
 

This returns something like
 adr1 usr1_1 cnt1_1 NULL usr1_2 cnt1_2 NULL usr1_3 cnt1_3 adr2 usr2_1 cnt2_1 ...

what's exactly what I want.  But when I don't name the CASE expression
(i.e. I delete "AS myaddr"), I get the following:
 adr1 usr1_1 cnt1_1 adr2 usr2_1 cnt2_1 ...

The other users for one address are gone.  Does anyone know why?



Re: Window function trouble

От
Harald Fuchs
Дата:
In article <pud42mahnr.fsf@srv.protecting.net>,
Harald Fuchs <hari.fuchs@gmail.com> writes:

> The other users for one address are gone.  Does anyone know why?

Update: they are not gone, but they've moved to the end of the result
set. Apparently the CASE expression is named "adr" unless named
otherwise, and the result set is sorted by this expression instead of
the column named "adr".  Does anyone know what the SQL standard says
about that?



Re: Window function trouble

От
Tom Lane
Дата:
Harald Fuchs <hari.fuchs@gmail.com> writes:
> Apparently the CASE expression is named "adr" unless named
> otherwise, and the result set is sorted by this expression instead of
> the column named "adr".  Does anyone know what the SQL standard says
> about that?

The SQL standard says the default name for any output column other
than a simple column reference is implementation-dependent.  I think
our implementation involves looking at the default value for a CASE.
        regards, tom lane


Re: Window function trouble

От
Harald Fuchs
Дата:
In article <25983.1260468559@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> writes:

> The SQL standard says the default name for any output column other
> than a simple column reference is implementation-dependent.  I think
> our implementation involves looking at the default value for a CASE.

Thanks for the clarification, Tom - so it's not a PostgreSQL bug, but
a misfeature of the SQL standard.

That being said, I still think that PostgreSQL could do better - how
about naming expression columns so that they are distinct from column
names?  The current implementation throws an error e.g. for
 SELECT adr,        CASE lag(adr) OVER (ORDER BY adr)        WHEN adr THEN NULL        ELSE adr        END,        usr,
count(*)FROM tbl WHERE host = 'h1' GROUP BY adr, usr ORDER BY adr, usr
 

namely `ORDER BY "adr" is ambiguous' which I find somewhat confusing.



Re: Window function trouble

От
Tom Lane
Дата:
Harald Fuchs <hari.fuchs@gmail.com> writes:
> That being said, I still think that PostgreSQL could do better - how
> about naming expression columns so that they are distinct from column
> names?

Even though the rules we use are pretty arbitrary, I'm hesitant to make
changes in them; it seems more likely to break existing applications
than to do anyone any good.
        regards, tom lane


Re: Window function trouble

От
Harald Fuchs
Дата:
In article <28855.1260486487@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Harald Fuchs <hari.fuchs@gmail.com> writes:
>> That being said, I still think that PostgreSQL could do better - how
>> about naming expression columns so that they are distinct from column
>> names?

> Even though the rules we use are pretty arbitrary, I'm hesitant to make
> changes in them; it seems more likely to break existing applications
> than to do anyone any good.

Well, it would be far less confusing.  Here's a self-contained example:
 CREATE TABLE t1 (   id serial NOT NULL,   adr text NOT NULL,   usr text NOT NULL,   PRIMARY KEY (id) );
 COPY t1 (adr, usr) FROM stdin DELIMITER '|'; a1|u1_1 a2|u2_1 a2|u2_2 a2|u2_2 a3|u3_1 a3|u3_2 a3|u3_2 a3|u3_3 a3|u3_3
a3|u3_3\.
 
 SELECT CASE lag(adr) OVER (ORDER BY adr, usr)        WHEN adr THEN NULL        ELSE adr        END,        usr,
count(*)FROM t1 GROUP BY adr, usr ORDER BY adr, usr;
 

The result set is:
  adr | usr  | count  -----+------+-------  a1  | u1_1 |     1  a2  | u2_1 |     1  a3  | u3_1 |     1      | u2_2 |
2      | u3_2 |     2      | u3_3 |     3
 

This means IMHO that "GROUP BY adr, usr" groups by the column named
"adr", whereas "ORDER BY adr, usr" orders by the unnamed CASE
expression which happens to "hide" the column name without warning.