Обсуждение: Need magical advice for counting NOTHING

Поиск
Список
Период
Сортировка

Need magical advice for counting NOTHING

От
Andreas
Дата:
Hi,
The source select counts log-events per user.
All is well when a user has at least one event per log_type in the log 
within a given timespan.
If one log_type is missing COUNT() has nothing to count and there is 
expectedly no result line that says 0.
BUT I need this 0-line because of a crosstab.  :(
I need to know how to prevent in my crosstab categories on the right to 
slip to the left, when the left category is emptyy.

Server 8.3.5

3 tables
log  (log_id, log_type_fk, user_fk, ts timestamp, ...)
users  (user_id, user_name, ...)
log_type  (log_type_id, log_type)
There are 3 events as log_type.

I naively tried
SELECT user_name, log_type_fk, COUNT(log_type_fk)
FROM log
JOIN users ON (user_id = user_fk)
WHERE (ts IS BETWEEN  sometime  AND   another)
GROUP BY user_name, log_type_fk
ORDER BY user_name, log_type_fk

This results e.g. to

user1, type1, 2
user1, type2, 3
user1, type3, 7
user2, type1, 11
user2, type3, 17

but I needed also
user2, type2, 0

How would I get there ?

Regards
Andreas


Re: Need magical advice for counting NOTHING

От
Shane Ambler
Дата:
Andreas wrote:
> Hi,
> The source select counts log-events per user.
> All is well when a user has at least one event per log_type in the log 
> within a given timespan.
> If one log_type is missing COUNT() has nothing to count and there is 
> expectedly no result line that says 0.
> BUT I need this 0-line because of a crosstab.  :(
> I need to know how to prevent in my crosstab categories on the right to 
> slip to the left, when the left category is emptyy.
> 
> Server 8.3.5
> 
> 3 tables
> log  (log_id, log_type_fk, user_fk, ts timestamp, ...)
> users  (user_id, user_name, ...)
> log_type  (log_type_id, log_type)
> There are 3 events as log_type.
> 
> I naively tried
> SELECT user_name, log_type_fk, COUNT(log_type_fk)
> FROM log
> JOIN users ON (user_id = user_fk)
> WHERE (ts IS BETWEEN  sometime  AND   another)
> GROUP BY user_name, log_type_fk
> ORDER BY user_name, log_type_fk
> 
> This results e.g. to
> 
> user1, type1, 2
> user1, type2, 3
> user1, type3, 7
> user2, type1, 11
> user2, type3, 17
> 
> but I needed also
> user2, type2, 0
> 
> How would I get there ?
> 
> Regards
> Andreas
> 
SELECT user_name, log_type_fk, COUNT(log_type_fk)

FROM log
RIGHT JOIN users ON (user_id = user_fk)

WHERE ts  BETWEEN  sometime  AND another
OR ts IS null

GROUP BY user_name, log_type_fk
ORDER BY user_name, log_type_fk


-- 

Shane Ambler
pgSQL (at) Sheeky (dot) Biz



Re: Need magical advice for counting NOTHING

От
"A. Kretschmer"
Дата:
In response to Andreas :
> Hi,
> The source select counts log-events per user.
> All is well when a user has at least one event per log_type in the log 
> within a given timespan.
> If one log_type is missing COUNT() has nothing to count and there is 
> expectedly no result line that says 0.
> BUT I need this 0-line because of a crosstab.  :(
> I need to know how to prevent in my crosstab categories on the right to 
> slip to the left, when the left category is emptyy.
> 
> Server 8.3.5
> 
> 3 tables
> log  (log_id, log_type_fk, user_fk, ts timestamp, ...)
> users  (user_id, user_name, ...)
> log_type  (log_type_id, log_type)
> There are 3 events as log_type.
> 
> I naively tried
> SELECT user_name, log_type_fk, COUNT(log_type_fk)
> FROM log
> JOIN users ON (user_id = user_fk)
> WHERE (ts IS BETWEEN  sometime  AND   another)
> GROUP BY user_name, log_type_fk
> ORDER BY user_name, log_type_fk
> 
> This results e.g. to
> 
> user1, type1, 2
> user1, type2, 3
> user1, type3, 7
> user2, type1, 11
> user2, type3, 17
> 
> but I needed also
> user2, type2, 0
> 
> How would I get there ?

without the WHERE-condition:


test=*# select * from log;log_id | log_type_fk | user_fk |         ts
--------+-------------+---------+---------------------     1 |           1 |       1 | 2009-07-23 00:00:00     2 |
    2 |       2 | 2009-07-23 00:00:00
 
(2 rows)

test=*# select * from users;user_id | user_name
---------+-----------      1 | user1      2 | user2
(2 rows)

test=*# select * from log_type;log_type_id | log_type
-------------+----------          1 | type1          2 | type2
(2 rows)

test=*# select foo.user_name, foo.log_type, sum(case when log_type_fk is
not null then 1 else 0 end) from (select user_id, user_name,
log_type_id, log_type from users cross join log_type) foo full join log
on ((foo.user_id, foo.log_type_id)=(log.user_fk, log.log_type_fk)) group
by 1,2 order by 1,2;user_name | log_type | sum
-----------+----------+-----user1     | type1    |   1user1     | type2    |   0user2     | type1    |   0user2     |
type2   |   1
 
(4 rows)


HTH, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net


Re: Need magical advice for counting NOTHING

От
"A. Kretschmer"
Дата:
In response to A. Kretschmer :
> test=*# select foo.user_name, foo.log_type, sum(case when log_type_fk is
> not null then 1 else 0 end) from (select user_id, user_name,
> log_type_id, log_type from users cross join log_type) foo full join log
> on ((foo.user_id, foo.log_type_id)=(log.user_fk, log.log_type_fk)) group
> by 1,2 order by 1,2;
>  user_name | log_type | sum
> -----------+----------+-----
>  user1     | type1    |   1
>  user1     | type2    |   0
>  user2     | type1    |   0
>  user2     | type2    |   1
> (4 rows)

Update:

select foo.user_name, foo.log_type, count(log_type_fk) ...


-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net


Re: Need magical advice for counting NOTHING

От
Glenn Maynard
Дата:
On Thu, Jul 23, 2009 at 1:01 AM, Andreas<maps.on@gmx.net> wrote:
> SELECT user_name, log_type_fk, COUNT(log_type_fk)
> FROM log
> JOIN users ON (user_id = user_fk)
> WHERE (ts IS BETWEEN  sometime  AND   another)
> GROUP BY user_name, log_type_fk
> ORDER BY user_name, log_type_fk

create table users (user_id integer, user_name varchar);
create table log_type  (log_type_id integer, log_type integer);
create table log (log_id integer, log_type_fk integer, user_fk integer);
insert into log_type (log_type_id, log_type) values (1, 1);
insert into log_type (log_type_id, log_type) values (2, 2);
insert into users (user_id, user_name) values (1, 'a');
insert into users (user_id, user_name) values (2, 'b');
insert into log (log_id, log_type_fk, user_fk) values (1, 1, 1);
insert into log (log_id, log_type_fk, user_fk) values (2, 2, 1);
insert into log (log_id, log_type_fk, user_fk) values (3, 2, 1);
insert into log (log_id, log_type_fk, user_fk) values (4, 1, 2);

SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT
NULL)::integer) AS count
FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk
AND log.log_type_fk = log_type.log_type)
GROUP BY user_name, log_type.log_type
ORDER BY user_name, log_type.log_type;

user_name | log_type | count
-----------+----------+-------a         |        1 |     1a         |        2 |     2a         |        3 |     0b
   |        1 |     1b         |        2 |     0b         |        3 |     0 

--
Glenn Maynard


Re: Need magical advice for counting NOTHING

От
nha
Дата:
Hello,

Le 23/07/09 10:23, Glenn Maynard a écrit :
> On Thu, Jul 23, 2009 at 1:01 AM, Andreas<maps.on@gmx.net> wrote:
>> SELECT user_name, log_type_fk, COUNT(log_type_fk)
>> FROM log
>> JOIN users ON (user_id = user_fk)
>> WHERE (ts IS BETWEEN  sometime  AND   another)
>> GROUP BY user_name, log_type_fk
>> ORDER BY user_name, log_type_fk
> [...] 
> SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT
> NULL)::integer) AS count
> FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk
> AND log.log_type_fk = log_type.log_type)
> GROUP BY user_name, log_type.log_type
> ORDER BY user_name, log_type.log_type;
> [...]

In the same direction as Andreas Krestchmer and Glenn Maynard (thanks to
the latter for DDL assumed statements), the following query should also
suit:

SELECT user_name, log_type, COUNT(log_type_fk)
FROM (users CROSS JOIN log_type)
LEFT JOIN log ON (user_id = user_fk AND log_type_id = log_type_fk)
WHERE (ts IS BETWEEN sometime AND another)
GROUP BY user_name, log_type
ORDER BY user_name, log_type

It is syntactically nearer the original query and includes no class
operator. Here are the two main ideas:
- Building all the possible couples of user name and log type by
cross-joining users and log_type tables;
- Counting rows in log table matching each couple (user, log_type) from
the previous cross-join (LEFT JOIN ensures that each row of the table on
the left is mined).

While it is formally assumed that user_id and log_type_id are
respectively keys for users and log_type tables, it is semantically
admitted here that user_name identifies user_id in users table and
log_type identifies log_type_id in log_type table.

Regards.
--
nha / Lyon / France.


Re: Need magical advice for counting NOTHING

От
Andreas
Дата:
nha schrieb:
> Hello,
>
> Le 23/07/09 10:23, Glenn Maynard a écrit :
>   
>> On Thu, Jul 23, 2009 at 1:01 AM, Andreas<maps.on@gmx.net> wrote:
>>     
>>> SELECT user_name, log_type_fk, COUNT(log_type_fk)
>>> FROM log
>>> JOIN users ON (user_id = user_fk)
>>> WHERE (ts IS BETWEEN  sometime  AND   another)
>>> GROUP BY user_name, log_type_fk
>>> ORDER BY user_name, log_type_fk
>>>       
>> [...] 
>> SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT
>> NULL)::integer) AS count
>> FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk
>> AND log.log_type_fk = log_type.log_type)
>> GROUP BY user_name, log_type.log_type
>> ORDER BY user_name, log_type.log_type;
>> [...]
>>     
>
> In the same direction as Andreas Krestchmer and Glenn Maynard (thanks to
> the latter for DDL assumed statements), the following query should also
> suit:
>
> SELECT user_name, log_type, COUNT(log_type_fk)
> FROM (users CROSS JOIN log_type)
> LEFT JOIN log ON (user_id = user_fk AND log_type_id = log_type_fk)
> WHERE (ts IS BETWEEN sometime AND another)
> GROUP BY user_name, log_type
> ORDER BY user_name, log_type
>   
I tried it and found it looses the COUNT() = 0 lines because of the date 
filtering at this position.
ts is a columns of log. The tricky lines are those log_types that aren't 
in log so (ts IS NULL) here and NULL can't be compared to timestamps so 
the interesting lines get filtered out of the LEFT JOIN's result and 
can't be counted.

The folowing aproach works AFAIK.

SELECT user_name, log_type_id, COUNT(log_type_fk)
FROM (users CROSS JOIN log_type)
LEFT JOIN
(    SELECT user_fk, log_type_fk    FROM log    WHERE (ts BETWEEN sometime AND another)
) AS x
ON (user_id = user_fk AND log_type_id = log_type_fk)
GROUP BY user_name, log_type_id
ORDER BY user_name, log_type_id

This subselect gives a really considerable speed up, too.

> While it is formally assumed that user_id and log_type_id are
> respectively keys for users and log_type tables, it is semantically
> admitted here that user_name identifies user_id in users table and
> log_type identifies log_type_id in log_type table.
Actually I didn't consider this key issue, yet.   :}
But, as the result gets stuffed in the crosstab function, the formally strict key log_type_id would do as column 2 as
well,or maybe better being a number instead of a varchar. 
 
I'll even change the first column to user_id and wrap another JOIN users around so I don't get messed up by users with
thesame name.
 


Thank you and Andreas Krestchmer and Glenn Maynard for giving me a new view on things.
:)