Обсуждение: playing with timestamp entries

Поиск
Список
Период
Сортировка

playing with timestamp entries

От
Dale Walker
Дата:

I record our radius logs in a firly basic table, schema is as follows:
----------------------------------------------------------
CREATE TABLE "history_new" (
        "username" character varying(50) NOT NULL,
        "time_stamp" int4 NOT NULL,
        "acctstatustype" character varying(8) NOT NULL ,
        "acctdelay" int2 NOT NULL,
        "acctinputoctets" int4 ,
        "acctoutputoctets" int4 ,
        "acctsessionid" character varying(30),
        "acctsessiontime" int4 ,
        "acctterminatecause" character varying(50),
        "nasidentifier" character varying(22),
        "nasport" character varying(4),
        "framedipaddress" character varying(16),
        "callingstationid" character varying(16),
        "ascenddatarate" character varying(16),
        "calledstationid" character varying(16)
);
-----------------------------------------------------------

I then create an index on the username column by:

---------------------------------------
CREATE INDEX "i_h_uh" on HISTORY using hash (username);
---------------------------------------

I use the 'hash' type as queries regarding usage will always be of the
form "select ...... where username='xxx';"

I also calculate a 'summary' in the form of a table I call sumlog:

-----------------------------------------------------------
CREATE TABLE "sumlog" (
        "username" character varying(8) NOT NULL,
        "period" character varying(8) NOT NULL,
        "sumtime" int4 DEFAULT '0' ,
        "mbup" float8 DEFAULT '0.0' ,
        "mbdn" float8 DEFAULT '0.0' );
------------------------------------------------------------

This table is populated by the following query:
-----------------------------------------------------
insert into sumlog
    select  s.username,
                to_char(timestamp(h.time_stamp),'YYYY-MM') as date,
                sum(h.acctsessiontime),
                sum(float8(h.acctinputoctets)/1000000),
                sum(float8(h.acctoutputoctets)/1000000)
        from subscribers as s,history as h
    where s.username=h.username
    group by s.username,date;
-----------------------------------------------------------

This works fine, but as the database size is constantly growing the
summary table takes a while to calculate...

I was thinking of doing the following:
    1. only update 'sumlog' for the current period (eg. 2001-04)
    2. adding an index on the timestamp column to speed-up the query for
the insert into sumlog.


My questions are:
    1. is it possible to create an index entry on the function applied to
the time_stamp.
        eg. something of the form [ create index "i_ts" on history
(to_char(timestamp(h.time_stamp),'YYYY-MM')) ]
    2. what is the best way to access the data from the history table for a
known period..
        eg. knowing period='2001-04' is there a better function to use than
'to_char' against the timestamp, any sort of indexing I should use,
etc...

I've been going around in circles, and I'm sure I've missed some
basic/common-sense sort of step, but now I'm running out of time to
spend on it....

Has anyone here done anything similar??


--
Dale Walker < dale@icr.com.au >
Independent Computer Retailers (ICR) Pty Ltd
http://www.icr.com.au/

Re: playing with timestamp entries

От
Tom Lane
Дата:
Dale Walker <dale@icr.com.au> writes:
> I use the 'hash' type as queries regarding usage will always be of the
> form "select ...... where username='xxx';"

Use a btree anyway.  Postgres' btree implementation is much better than
its hash index implementation.

> insert into sumlog
>     select  s.username,
>                 to_char(timestamp(h.time_stamp),'YYYY-MM') as date,
>                 sum(h.acctsessiontime),
>                 sum(float8(h.acctinputoctets)/1000000),
>                 sum(float8(h.acctoutputoctets)/1000000)
>         from subscribers as s,history as h
>     where s.username=h.username
>     group by s.username,date;

> This works fine, but as the database size is constantly growing the
> summary table takes a while to calculate...

What plan does EXPLAIN show for this query?

            regards, tom lane

Re: playing with timestamp entries

От
Dale Walker
Дата:
Tom Lane wrote:
>
> Dale Walker <dale@icr.com.au> writes:
> > I use the 'hash' type as queries regarding usage will always be of the
> > form "select ...... where username='xxx';"
>
> Use a btree anyway.  Postgres' btree implementation is much better than
> its hash index implementation.
>

OK, I'll give that a whirl...



> > insert into sumlog
> >       select  s.username,
> >                 to_char(timestamp(h.time_stamp),'YYYY-MM') as date,
> >                 sum(h.acctsessiontime),
> >                 sum(float8(h.acctinputoctets)/1000000),
> >                 sum(float8(h.acctoutputoctets)/1000000)
> >         from subscribers as s,history as h
> >       where s.username=h.username
> >       group by s.username,date;
>
> > This works fine, but as the database size is constantly growing the
> > summary table takes a while to calculate...
>
> What plan does EXPLAIN show for this query?
>
>                         regards, tom lane

psql:zz.sql:7: NOTICE:  QUERY PLAN:

Aggregate  (cost=349984.03..365862.83 rows=127030 width=40)
  ->  Group  (cost=349984.03..356335.55 rows=1270304 width=40)
        ->  Sort  (cost=349984.03..349984.03 rows=1270304 width=40)
              ->  Hash Join  (cost=27.35..87635.90 rows=1270304
width=40)
                    ->  Seq Scan on history h  (cost=0.00..36786.04
rows=1270304 width=28)
                    ->  Hash  (cost=25.28..25.28 rows=828 width=12)
                          ->  Seq Scan on subscribers s
(cost=0.00..25.28 rows=828 width=12)

EXPLAIN

----------

The way I read this, I think my biggest problem is in the
sorting/grouping...


--
Dale Walker < dale@icr.com.au >
Independent Computer Retailers (ICR) Pty Ltd
http://www.icr.com.au/