Обсуждение: Massive table bloat

Поиск
Список
Период
Сортировка

Massive table bloat

От
"Markus Wollny"
Дата:
Hi!

I've set up some system to track slow page executions in one of our (as
yet not live) web apps. The tracking itself is handled completely within
the database using a function. Within a very short time (approx. 1 week)
and although we haven't got that much traffic on our testpages, the
table in question as grown beyond a size of 23 GB, even though a SELECT
count(*) on it will tell me that it only contains 235 rows. I'm sure I
must be missing something obvious here...

Here's the DDL for the table:

CREATE TABLE stats.slowpages
(
url text NOT NULL,
lastexecduration integer NOT NULL,
avgslowexecduration integer,
execcount integer,
lastexectime timestamp without time zone,
site_id integer NOT NULL,
slowestexecduration integer,
totaltimespent bigint,
CONSTRAINT "slowpages_pkey" PRIMARY KEY (url)
)WITHOUT OIDS;

-- Indexes
CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree
(lastexecduration);
CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree
(avgslowexecduration);
CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree
(execcount);
CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree
(lastexectime);
CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree
(site_id);
CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING
btree (url, site_id);
CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree
(totaltimespent);

And this here is the function we use to insert or update entries in this
table:

CREATE or REPLACE FUNCTION "stats"."iou_slowpages"(
IN "_site_id" integer,
IN "_url" text,
IN "_duration" integer)
RETURNS void AS
$BODY$
BEGIN
    LOOP

        UPDATE stats.slowpages
               SET  avgslowexecduration =
((avgslowexecduration*execcount)+_duration)/(execcount+1)
               ,    execcount = execcount+1
               ,    lastexectime = now()
               ,    lastexecduration = _duration
               ,    totaltimespent = totaltimespent + _duration
               ,    slowestexecduration = CASE WHEN _duration >
slowestexecduration
                    THEN _duration ELSE slowestexecduration END
               WHERE url = _url AND site_id = _site_id;
        IF found THEN
            RETURN;
        END IF;

        BEGIN
            INSERT INTO
stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura
tion,totaltimespent,execcount,lastexectime,site_id)
            VALUES (_url, _duration, _duration,_duration,_duration, 1,
now(), _site_id);
            RETURN;
        EXCEPTION WHEN unique_violation THEN

        END;
    END LOOP;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

_site_id is a small integer value, _url is a full URL string to a page
and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7.

Any idea about what I may be missing here?

Kind regards

   Markus


Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276



Re: Massive table bloat

От
Thom Brown
Дата:
On 3 March 2010 15:33, Markus Wollny <Markus.Wollny@computec.de> wrote:
> Hi!
>
> I've set up some system to track slow page executions in one of our (as
> yet not live) web apps. The tracking itself is handled completely within
> the database using a function. Within a very short time (approx. 1 week)
> and although we haven't got that much traffic on our testpages, the
> table in question as grown beyond a size of 23 GB, even though a SELECT
> count(*) on it will tell me that it only contains 235 rows. I'm sure I
> must be missing something obvious here...
>
> Here's the DDL for the table:
>
> CREATE TABLE stats.slowpages
> (
> url text NOT NULL,
> lastexecduration integer NOT NULL,
> avgslowexecduration integer,
> execcount integer,
> lastexectime timestamp without time zone,
> site_id integer NOT NULL,
> slowestexecduration integer,
> totaltimespent bigint,
> CONSTRAINT "slowpages_pkey" PRIMARY KEY (url)
> )WITHOUT OIDS;
>
> -- Indexes
> CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree
> (lastexecduration);
> CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree
> (avgslowexecduration);
> CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree
> (execcount);
> CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree
> (lastexectime);
> CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree
> (site_id);
> CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING
> btree (url, site_id);
> CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree
> (totaltimespent);
>
> And this here is the function we use to insert or update entries in this
> table:
>
> CREATE or REPLACE FUNCTION "stats"."iou_slowpages"(
> IN "_site_id" integer,
> IN "_url" text,
> IN "_duration" integer)
> RETURNS void AS
> $BODY$
> BEGIN
>    LOOP
>
>        UPDATE stats.slowpages
>               SET  avgslowexecduration =
> ((avgslowexecduration*execcount)+_duration)/(execcount+1)
>               ,    execcount = execcount+1
>               ,    lastexectime = now()
>               ,    lastexecduration = _duration
>               ,    totaltimespent = totaltimespent + _duration
>               ,    slowestexecduration = CASE WHEN _duration >
> slowestexecduration
>                    THEN _duration ELSE slowestexecduration END
>               WHERE url = _url AND site_id = _site_id;
>        IF found THEN
>            RETURN;
>        END IF;
>
>        BEGIN
>            INSERT INTO
> stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura
> tion,totaltimespent,execcount,lastexectime,site_id)
>            VALUES (_url, _duration, _duration,_duration,_duration, 1,
> now(), _site_id);
>            RETURN;
>        EXCEPTION WHEN unique_violation THEN
>
>        END;
>    END LOOP;
> END;
> $BODY$
> LANGUAGE 'plpgsql' VOLATILE;
>
> _site_id is a small integer value, _url is a full URL string to a page
> and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7.
>
> Any idea about what I may be missing here?
>
> Kind regards
>
>   Markus
>

If you update rows, it actually creates a new version of it.  The old
one doesn't get removed until the VACUUM process cleans it up, so
maybe you need to run that against the database?

Regards

Thom

Re: Massive table bloat

От
Grzegorz Jaśkiewicz
Дата:
do a vacuum analyze verbose on it, and see if it complains about FSM (free space map) setting. Which it probably will be.

Re: Massive table bloat

От
"Markus Wollny"
Дата:
> -----Ursprüngliche Nachricht-----
> Von: Thom Brown [mailto:thombrown@gmail.com]
> Gesendet: Mittwoch, 3. März 2010 16:56
> An: Markus Wollny
> Cc: pgsql-general@postgresql.org
> Betreff: Re: [GENERAL] Massive table bloat

> If you update rows, it actually creates a new version of it.
> The old one doesn't get removed until the VACUUM process
> cleans it up, so maybe you need to run that against the database?

I already do on a nightly basis (which is probably not often enough in this case) and have got autovacuum running. I'll
checkinto FSM settings as suggested by Grzegorz Jaśkiewicz, there's probably half a solution to the problem there, the
otherhalf being probably the autovacuum daemon not visiting this table nearly often enough. 

Kind regards

  Markus


Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276