Обсуждение: Window function trouble
I have a table like this: CREATE TABLE tbl ( host text NOT NULL, adr ip4 NOT NULL, usr text NOT NULL ); (ip4 is from the ip4r contrib module) and I want the number of entries per address and per user: SELECT adr, usr, count(*) FROM tbl WHERE host = ? AND adr <<= ? GROUP BY adr, usr ORDER BY adr, usr That's pretty basic stuff and returns something like this: adr1 usr1_1 cnt1_1 adr1 usr1_2 cnt1_2 adr1 usr1_3 cnt1_3 adr2 usr2_1 cnt2_1 ... But I want the address to be NULL if it's the same as the address of the previous row. I came up with this: SELECT CASE lag(adr) OVER (ORDER BY adr) WHEN adr THEN NULL ELSE adr END AS myaddr, usr, count(*)FROM tbl WHERE host = ? AND adr <<= ? GROUP BY adr, usr ORDER BY adr, usr This returns something like adr1 usr1_1 cnt1_1 NULL usr1_2 cnt1_2 NULL usr1_3 cnt1_3 adr2 usr2_1 cnt2_1 ... what's exactly what I want. But when I don't name the CASE expression (i.e. I delete "AS myaddr"), I get the following: adr1 usr1_1 cnt1_1 adr2 usr2_1 cnt2_1 ... The other users for one address are gone. Does anyone know why?
In article <pud42mahnr.fsf@srv.protecting.net>, Harald Fuchs <hari.fuchs@gmail.com> writes: > The other users for one address are gone. Does anyone know why? Update: they are not gone, but they've moved to the end of the result set. Apparently the CASE expression is named "adr" unless named otherwise, and the result set is sorted by this expression instead of the column named "adr". Does anyone know what the SQL standard says about that?
Harald Fuchs <hari.fuchs@gmail.com> writes: > Apparently the CASE expression is named "adr" unless named > otherwise, and the result set is sorted by this expression instead of > the column named "adr". Does anyone know what the SQL standard says > about that? The SQL standard says the default name for any output column other than a simple column reference is implementation-dependent. I think our implementation involves looking at the default value for a CASE. regards, tom lane
In article <25983.1260468559@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> writes: > The SQL standard says the default name for any output column other > than a simple column reference is implementation-dependent. I think > our implementation involves looking at the default value for a CASE. Thanks for the clarification, Tom - so it's not a PostgreSQL bug, but a misfeature of the SQL standard. That being said, I still think that PostgreSQL could do better - how about naming expression columns so that they are distinct from column names? The current implementation throws an error e.g. for SELECT adr, CASE lag(adr) OVER (ORDER BY adr) WHEN adr THEN NULL ELSE adr END, usr, count(*)FROM tbl WHERE host = 'h1' GROUP BY adr, usr ORDER BY adr, usr namely `ORDER BY "adr" is ambiguous' which I find somewhat confusing.
Harald Fuchs <hari.fuchs@gmail.com> writes: > That being said, I still think that PostgreSQL could do better - how > about naming expression columns so that they are distinct from column > names? Even though the rules we use are pretty arbitrary, I'm hesitant to make changes in them; it seems more likely to break existing applications than to do anyone any good. regards, tom lane
In article <28855.1260486487@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> writes: > Harald Fuchs <hari.fuchs@gmail.com> writes: >> That being said, I still think that PostgreSQL could do better - how >> about naming expression columns so that they are distinct from column >> names? > Even though the rules we use are pretty arbitrary, I'm hesitant to make > changes in them; it seems more likely to break existing applications > than to do anyone any good. Well, it would be far less confusing. Here's a self-contained example: CREATE TABLE t1 ( id serial NOT NULL, adr text NOT NULL, usr text NOT NULL, PRIMARY KEY (id) ); COPY t1 (adr, usr) FROM stdin DELIMITER '|'; a1|u1_1 a2|u2_1 a2|u2_2 a2|u2_2 a3|u3_1 a3|u3_2 a3|u3_2 a3|u3_3 a3|u3_3 a3|u3_3\. SELECT CASE lag(adr) OVER (ORDER BY adr, usr) WHEN adr THEN NULL ELSE adr END, usr, count(*)FROM t1 GROUP BY adr, usr ORDER BY adr, usr; The result set is: adr | usr | count -----+------+------- a1 | u1_1 | 1 a2 | u2_1 | 1 a3 | u3_1 | 1 | u2_2 | 2 | u3_2 | 2 | u3_3 | 3 This means IMHO that "GROUP BY adr, usr" groups by the column named "adr", whereas "ORDER BY adr, usr" orders by the unnamed CASE expression which happens to "hide" the column name without warning.