Обсуждение: two count columns?

Поиск
Список
Период
Сортировка

two count columns?

От
Jan Danielsson
Дата:
Hello all,
  I have written a small firewall, and sshd, logging system which uses
postgresql for storing the logs.
  The table that stores the firewall logs has, among other columns,
from_ip, port and a timestamp.

I want to construct a query which returns the columns:

from_ip | port 22 entries | total entries | last_access

I have managed to put this together:

select from_ip, count(from_ip) as entries, max(ts)::timestamp(0) as
last_access from log where to_port=22 and direction='in' group by from_ip
  So the only thing I'm missing is the total number of log entries
matching the from_ip, but that's where I'm stuck. My instinct is to try
to use subqueries:

select from_ip, count(from_ip) as entries, count(select * from log where
...) as tot_entries, max(ts)::timestamp(0) as last_access from log where
to_port=22 and direction='in' group by from_ip

..but count() doesn't take a subquery afaik, and how would I match the
from_ip in the inner select with the outer one? So I assume that
subqueries aren't the way yo go. Is there even a query to return those
columns in that configuration?

Thankful for any hints or tips.

--
Kind Regards,
Jan Danielsson
Te audire non possum. Musa sapientum fixa est in aure.


Re: two count columns?

От
Tom Lane
Дата:
Jan Danielsson <jan.danielsson@gmail.com> writes:
>    So the only thing I'm missing is the total number of log entries
> matching the from_ip, but that's where I'm stuck. My instinct is to try
> to use subqueries:

> select from_ip, count(from_ip) as entries, count(select * from log where
> ...) as tot_entries, max(ts)::timestamp(0) as last_access from log where
> to_port=22 and direction='in' group by from_ip

> ...but count() doesn't take a subquery afaik, and how would I match the
> from_ip in the inner select with the outer one?

I think you want something like
 select ..., (select count(*) from log loginner where from_ip = log.from_ip) ... from log ...

You need to attach an alias to either the inner or the outer use of
"log" so that you can refer to the outer one from the inner SELECT.
Here I chose to alias the inner one, but it might be clearer to
alias the outer:
 select ..., (select count(*) from log where from_ip = logouter.from_ip) ... from log logouter ...

Note that anything like this is going to be pretty expensive if your log
table is large.  You might want to think about something involving
another layer of GROUP BY instead.
        regards, tom lane


Re: two count columns?

От
Greg Stark
Дата:
Jan Danielsson <jan.danielsson@gmail.com> writes:

> select from_ip, count(from_ip) as entries, count(select * from log where
> ...) as tot_entries, max(ts)::timestamp(0) as last_access from log where
> to_port=22 and direction='in' group by from_ip

select from_ip,       count(from_ip) as entries,       (select count(*)          from log as l        where l.from_ip =
log.from_ip     ) as tot_entries,      max(ts) as last_access from logwhere to_port=22  and direction='ingroup by
from_ip

expect it to be pretty slow though. For every from_ip it has to look up every
other entry with that from_ip.

> Thankful for any hints or tips.

There is a trick you could use to make it faster but it gets cumbersome and
pretty tricky to use when you're doing more than one thing at a time:

select from_ip      sum(case when to_port=22 and direction='in' then 1 else 0 end) as entries,      count(*) as
tot_entries,     max(case when to_port=22 and direction='in' then ts::timestamp(0) else null end) as last_access from
loggroupby from_iphaving entries > 0
 


Note that in either case you might want to look at ANALYZE results for the
query and try raising work_mem for this query using SET until you see the plan
using a hash aggregate. If it can use a hash aggregate for your query (more
likely for the first query than the second) without swapping it'll be faster
than sorting.

-- 
greg