Обсуждение: Need magical advice for counting NOTHING
Hi, The source select counts log-events per user. All is well when a user has at least one event per log_type in the log within a given timespan. If one log_type is missing COUNT() has nothing to count and there is expectedly no result line that says 0. BUT I need this 0-line because of a crosstab. :( I need to know how to prevent in my crosstab categories on the right to slip to the left, when the left category is emptyy. Server 8.3.5 3 tables log (log_id, log_type_fk, user_fk, ts timestamp, ...) users (user_id, user_name, ...) log_type (log_type_id, log_type) There are 3 events as log_type. I naively tried SELECT user_name, log_type_fk, COUNT(log_type_fk) FROM log JOIN users ON (user_id = user_fk) WHERE (ts IS BETWEEN sometime AND another) GROUP BY user_name, log_type_fk ORDER BY user_name, log_type_fk This results e.g. to user1, type1, 2 user1, type2, 3 user1, type3, 7 user2, type1, 11 user2, type3, 17 but I needed also user2, type2, 0 How would I get there ? Regards Andreas
Andreas wrote: > Hi, > The source select counts log-events per user. > All is well when a user has at least one event per log_type in the log > within a given timespan. > If one log_type is missing COUNT() has nothing to count and there is > expectedly no result line that says 0. > BUT I need this 0-line because of a crosstab. :( > I need to know how to prevent in my crosstab categories on the right to > slip to the left, when the left category is emptyy. > > Server 8.3.5 > > 3 tables > log (log_id, log_type_fk, user_fk, ts timestamp, ...) > users (user_id, user_name, ...) > log_type (log_type_id, log_type) > There are 3 events as log_type. > > I naively tried > SELECT user_name, log_type_fk, COUNT(log_type_fk) > FROM log > JOIN users ON (user_id = user_fk) > WHERE (ts IS BETWEEN sometime AND another) > GROUP BY user_name, log_type_fk > ORDER BY user_name, log_type_fk > > This results e.g. to > > user1, type1, 2 > user1, type2, 3 > user1, type3, 7 > user2, type1, 11 > user2, type3, 17 > > but I needed also > user2, type2, 0 > > How would I get there ? > > Regards > Andreas > SELECT user_name, log_type_fk, COUNT(log_type_fk) FROM log RIGHT JOIN users ON (user_id = user_fk) WHERE ts BETWEEN sometime AND another OR ts IS null GROUP BY user_name, log_type_fk ORDER BY user_name, log_type_fk -- Shane Ambler pgSQL (at) Sheeky (dot) Biz
In response to Andreas : > Hi, > The source select counts log-events per user. > All is well when a user has at least one event per log_type in the log > within a given timespan. > If one log_type is missing COUNT() has nothing to count and there is > expectedly no result line that says 0. > BUT I need this 0-line because of a crosstab. :( > I need to know how to prevent in my crosstab categories on the right to > slip to the left, when the left category is emptyy. > > Server 8.3.5 > > 3 tables > log (log_id, log_type_fk, user_fk, ts timestamp, ...) > users (user_id, user_name, ...) > log_type (log_type_id, log_type) > There are 3 events as log_type. > > I naively tried > SELECT user_name, log_type_fk, COUNT(log_type_fk) > FROM log > JOIN users ON (user_id = user_fk) > WHERE (ts IS BETWEEN sometime AND another) > GROUP BY user_name, log_type_fk > ORDER BY user_name, log_type_fk > > This results e.g. to > > user1, type1, 2 > user1, type2, 3 > user1, type3, 7 > user2, type1, 11 > user2, type3, 17 > > but I needed also > user2, type2, 0 > > How would I get there ? without the WHERE-condition: test=*# select * from log;log_id | log_type_fk | user_fk | ts --------+-------------+---------+--------------------- 1 | 1 | 1 | 2009-07-23 00:00:00 2 | 2 | 2 | 2009-07-23 00:00:00 (2 rows) test=*# select * from users;user_id | user_name ---------+----------- 1 | user1 2 | user2 (2 rows) test=*# select * from log_type;log_type_id | log_type -------------+---------- 1 | type1 2 | type2 (2 rows) test=*# select foo.user_name, foo.log_type, sum(case when log_type_fk is not null then 1 else 0 end) from (select user_id, user_name, log_type_id, log_type from users cross join log_type) foo full join log on ((foo.user_id, foo.log_type_id)=(log.user_fk, log.log_type_fk)) group by 1,2 order by 1,2;user_name | log_type | sum -----------+----------+-----user1 | type1 | 1user1 | type2 | 0user2 | type1 | 0user2 | type2 | 1 (4 rows) HTH, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
In response to A. Kretschmer : > test=*# select foo.user_name, foo.log_type, sum(case when log_type_fk is > not null then 1 else 0 end) from (select user_id, user_name, > log_type_id, log_type from users cross join log_type) foo full join log > on ((foo.user_id, foo.log_type_id)=(log.user_fk, log.log_type_fk)) group > by 1,2 order by 1,2; > user_name | log_type | sum > -----------+----------+----- > user1 | type1 | 1 > user1 | type2 | 0 > user2 | type1 | 0 > user2 | type2 | 1 > (4 rows) Update: select foo.user_name, foo.log_type, count(log_type_fk) ... -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Thu, Jul 23, 2009 at 1:01 AM, Andreas<maps.on@gmx.net> wrote: > SELECT user_name, log_type_fk, COUNT(log_type_fk) > FROM log > JOIN users ON (user_id = user_fk) > WHERE (ts IS BETWEEN sometime AND another) > GROUP BY user_name, log_type_fk > ORDER BY user_name, log_type_fk create table users (user_id integer, user_name varchar); create table log_type (log_type_id integer, log_type integer); create table log (log_id integer, log_type_fk integer, user_fk integer); insert into log_type (log_type_id, log_type) values (1, 1); insert into log_type (log_type_id, log_type) values (2, 2); insert into users (user_id, user_name) values (1, 'a'); insert into users (user_id, user_name) values (2, 'b'); insert into log (log_id, log_type_fk, user_fk) values (1, 1, 1); insert into log (log_id, log_type_fk, user_fk) values (2, 2, 1); insert into log (log_id, log_type_fk, user_fk) values (3, 2, 1); insert into log (log_id, log_type_fk, user_fk) values (4, 1, 2); SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT NULL)::integer) AS count FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk AND log.log_type_fk = log_type.log_type) GROUP BY user_name, log_type.log_type ORDER BY user_name, log_type.log_type; user_name | log_type | count -----------+----------+-------a | 1 | 1a | 2 | 2a | 3 | 0b | 1 | 1b | 2 | 0b | 3 | 0 -- Glenn Maynard
Hello, Le 23/07/09 10:23, Glenn Maynard a écrit : > On Thu, Jul 23, 2009 at 1:01 AM, Andreas<maps.on@gmx.net> wrote: >> SELECT user_name, log_type_fk, COUNT(log_type_fk) >> FROM log >> JOIN users ON (user_id = user_fk) >> WHERE (ts IS BETWEEN sometime AND another) >> GROUP BY user_name, log_type_fk >> ORDER BY user_name, log_type_fk > [...] > SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT > NULL)::integer) AS count > FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk > AND log.log_type_fk = log_type.log_type) > GROUP BY user_name, log_type.log_type > ORDER BY user_name, log_type.log_type; > [...] In the same direction as Andreas Krestchmer and Glenn Maynard (thanks to the latter for DDL assumed statements), the following query should also suit: SELECT user_name, log_type, COUNT(log_type_fk) FROM (users CROSS JOIN log_type) LEFT JOIN log ON (user_id = user_fk AND log_type_id = log_type_fk) WHERE (ts IS BETWEEN sometime AND another) GROUP BY user_name, log_type ORDER BY user_name, log_type It is syntactically nearer the original query and includes no class operator. Here are the two main ideas: - Building all the possible couples of user name and log type by cross-joining users and log_type tables; - Counting rows in log table matching each couple (user, log_type) from the previous cross-join (LEFT JOIN ensures that each row of the table on the left is mined). While it is formally assumed that user_id and log_type_id are respectively keys for users and log_type tables, it is semantically admitted here that user_name identifies user_id in users table and log_type identifies log_type_id in log_type table. Regards. -- nha / Lyon / France.
nha schrieb: > Hello, > > Le 23/07/09 10:23, Glenn Maynard a écrit : > >> On Thu, Jul 23, 2009 at 1:01 AM, Andreas<maps.on@gmx.net> wrote: >> >>> SELECT user_name, log_type_fk, COUNT(log_type_fk) >>> FROM log >>> JOIN users ON (user_id = user_fk) >>> WHERE (ts IS BETWEEN sometime AND another) >>> GROUP BY user_name, log_type_fk >>> ORDER BY user_name, log_type_fk >>> >> [...] >> SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT >> NULL)::integer) AS count >> FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk >> AND log.log_type_fk = log_type.log_type) >> GROUP BY user_name, log_type.log_type >> ORDER BY user_name, log_type.log_type; >> [...] >> > > In the same direction as Andreas Krestchmer and Glenn Maynard (thanks to > the latter for DDL assumed statements), the following query should also > suit: > > SELECT user_name, log_type, COUNT(log_type_fk) > FROM (users CROSS JOIN log_type) > LEFT JOIN log ON (user_id = user_fk AND log_type_id = log_type_fk) > WHERE (ts IS BETWEEN sometime AND another) > GROUP BY user_name, log_type > ORDER BY user_name, log_type > I tried it and found it looses the COUNT() = 0 lines because of the date filtering at this position. ts is a columns of log. The tricky lines are those log_types that aren't in log so (ts IS NULL) here and NULL can't be compared to timestamps so the interesting lines get filtered out of the LEFT JOIN's result and can't be counted. The folowing aproach works AFAIK. SELECT user_name, log_type_id, COUNT(log_type_fk) FROM (users CROSS JOIN log_type) LEFT JOIN ( SELECT user_fk, log_type_fk FROM log WHERE (ts BETWEEN sometime AND another) ) AS x ON (user_id = user_fk AND log_type_id = log_type_fk) GROUP BY user_name, log_type_id ORDER BY user_name, log_type_id This subselect gives a really considerable speed up, too. > While it is formally assumed that user_id and log_type_id are > respectively keys for users and log_type tables, it is semantically > admitted here that user_name identifies user_id in users table and > log_type identifies log_type_id in log_type table. Actually I didn't consider this key issue, yet. :} But, as the result gets stuffed in the crosstab function, the formally strict key log_type_id would do as column 2 as well,or maybe better being a number instead of a varchar. I'll even change the first column to user_id and wrap another JOIN users around so I don't get messed up by users with thesame name. Thank you and Andreas Krestchmer and Glenn Maynard for giving me a new view on things. :)