Обсуждение: Re: [SQL] How to avoid "Out of memory" using aggregate functions?

Поиск
Список
Период
Сортировка

Re: [SQL] How to avoid "Out of memory" using aggregate functions?

От
Frank Joerdens
Дата:
Presumably I am suffering from the same problem running 6.5.1:

The table has the following structure:

create table stunde_test (
source_numeric          int8 not null,
destination_numeric     int8 not null,
source                  inet not null,
destination             inet not null,
bytes                   int8 not null,
date                    datetime
);
It stores IP traffic data and currently contains a little over
2 million rows.

The query is

SELECT
sum(stunde_test.bytes), 
date_trunc('day', stunde_test.date), 
stunde_test.destination 
FROM
stunde_test, 
domains 
WHERE
date_part('hour', stunde_test.date) > 6 
AND 
date_part('hour', stunde_test.date) < 23 
AND
domains.internet = stunde_test.destination 
and date_trunc('day', stunde_test.date) = 'yesterday' 
GROUP BY
date_trunc('day', stunde_test.date), 
stunde_test.destination;

which stopped working after the table grew over the 1 million mark
(approximately) - I get the error

FATAL 1:  Memory exhausted in AllocSetAlloc()

This sounds to me like the problem described below . . . my question is:
Would the changes you already made to the current sources fix it for me?
Otherwise I'd have to shring the table by removing older data or use
the brutal method of just putting more memory into the box . . .

thanks, frank


On Thu, Dec 02, 1999 at 06:32:59PM -0500, Tom Lane wrote:
> Mark Dalphin <mdalphin@amgen.com> writes:
> > ... since all the elements requested
> > are aggregates, why do I run out of memory?
> 
> The basic problem here is that Postgres' expression evaluator leaks
> memory when dealing with expressions that use pass-by-reference data
> types (which means practically any type other than bool, int2, int4).
> The memory is reclaimed at the end of the query ... but that doesn't
> help you if your allowable swap space fills with int8 values before
> you get to the end :-(.
> 
> Fixing this is on the TODO list.  I have made a start on it in current
> sources: aggregate functions applied to simple field references do not
> leak memory anymore.  That won't help your query as written:
> 
> SELECT min(seqlength::int8) ...
> 
> since cast-to-int8 is a runtime type conversion expression and the
> result of that expression evaluation will be leaked.  But perhaps you
> can store seqlength as an int8 field to begin with, and skip the
> conversion.  If so, and if you're handy with C, you might want to look
> at src/backend/executor/nodeAgg.c in a current snapshot and see whether
> it's practical to back-patch the memory-releasing logic into 6.5 ...
> or just run the current snapshot if you're feeling adventurous ...
> 
> Of course the whole problem only comes up because avg() and sum() use
> accumulators of the same width as the source data type, so they are
> uncomfortably prone to overflow on large tables.  They really should
> be modified to use more reasonable choices of accumulation data type;
> probably float8 for all datatypes except int8 and numeric, and numeric
> for those two would work OK.  This wasn't practical in prior releases
> because a float8 accumulator would leak memory, but I think it would
> be a good change to make for 7.0.
> 
> Anyway, to get back to your immediate problem, what you probably want to
> do while using 6.5 is tweak the query to minimize the problem as much as
> you can.  First off, I see no reason to cast the inputs of count(),
> min(), or max() to int8; leaving those inputs as the int4 field value
> should work just as well and not leak memory.  You don't have much
> choice for either avg() or sum(), but perhaps not trying to evaluate
> both in one query will bring the memory usage down to where you can
> live with it.  If not, the only answer I can see is to do sums() over
> subsets of the table and save the results to be added and averaged
> later (hack hack ...)
> 
>             regards, tom lane
> 
> ************


Re: [SQL] How to avoid "Out of memory" using aggregate functions?

От
Tom Lane
Дата:
Frank Joerdens <frank@x9media.com> writes:
> [ complex query ]
> which stopped working after the table grew over the 1 million mark
> (approximately) - I get the error

> FATAL 1:  Memory exhausted in AllocSetAlloc()

> This sounds to me like the problem described below . . . my question is:
> Would the changes you already made to the current sources fix it for me?

Afraid not.  I think what's killing you is all those date_part() and
date_trunc() operations --- the resultant datetime or float8 value from
each one occupies memory that won't get reclaimed till end of statement
:-(.

Silly as it sounds, you might be able to put off the problem by
rearranging the order of the WHERE clauses, remembering that AND stops
evaluating its subclauses as soon as it finds a FALSE.  Presumably the
day check eliminates many more rows than the time-of-day checks, so

WHERE
domains.internet = stunde_test.destination 
AND
date_trunc('day', stunde_test.date) = 'yesterday' 
AND
date_part('hour', stunde_test.date) > 6 
AND 
date_part('hour', stunde_test.date) < 23 

would probably about halve the number of date_part+date_trunc
calculations done.

Of course the real fix is to recycle temporary memory for all
expressions intra-query, but I do not know if that will get done for
7.0.  It will get done eventually.
        regards, tom lane


Re: [SQL] How to avoid "Out of memory" using aggregate functions?

От
Frank Joerdens
Дата:
Well, not silly at all . . . regrouping
the order of the WHERE clause does the trick!

Cheers,

Frank


On Mon, Dec 06, 1999 at 11:23:27AM -0500, Tom Lane wrote:
> Frank Joerdens <frank@x9media.com> writes:
> > [ complex query ]
> > which stopped working after the table grew over the 1 million mark
> > (approximately) - I get the error
> 
> > FATAL 1:  Memory exhausted in AllocSetAlloc()
> 
> > This sounds to me like the problem described below . . . my question is:
> > Would the changes you already made to the current sources fix it for me?
> 
> Afraid not.  I think what's killing you is all those date_part() and
> date_trunc() operations --- the resultant datetime or float8 value from
> each one occupies memory that won't get reclaimed till end of statement
> :-(.
> 
> Silly as it sounds, you might be able to put off the problem by
> rearranging the order of the WHERE clauses, remembering that AND stops
> evaluating its subclauses as soon as it finds a FALSE.  Presumably the
> day check eliminates many more rows than the time-of-day checks, so
> 
> WHERE
> domains.internet = stunde_test.destination 
> AND
> date_trunc('day', stunde_test.date) = 'yesterday' 
> AND
> date_part('hour', stunde_test.date) > 6 
> AND 
> date_part('hour', stunde_test.date) < 23 
> 
> would probably about halve the number of date_part+date_trunc
> calculations done.
> 
> Of course the real fix is to recycle temporary memory for all
> expressions intra-query, but I do not know if that will get done for
> 7.0.  It will get done eventually.
> 
>             regards, tom lane
> 
> ************