Обсуждение: out of memory with INSERT INTO... SELECT...

Поиск
Список
Период
Сортировка

out of memory with INSERT INTO... SELECT...

От
Shane Ambler
Дата:
Not real sure if this is an issue but from what I figure there is too
much RAM being chewed up from this.

The end result is an out of memory error
(I haven't delved deeper as yet)


So I am replicating what someone else is failing to get working in
sqlite to see what pg can do.

The end scenario is the generation of unique codes of 7 alphanumeric
characters in length. Not sure how many he really needs but is trying
for 30 million. (initial testing on the assumption that select distinct
is faster then insert into unique index column)

So anyway I created the table


CREATE TABLE codes (pincode text);


and then came up with an insert query instead of his client looping
through etc..

INSERT INTO codes

SELECT pincode FROM
(
   SELECT generate_series(1,1000000) AS idx
, substring('abcdefghijklmnopqrstuvwxyz0123456789' from
cast((random()*36)as integer) for 1)
||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
cast((random()*36)as integer) for 1)
||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
cast((random()*36)as integer) for 1)
||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
cast((random()*36)as integer) for 1)
||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
cast((random()*36)as integer) for 1)
||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
cast((random()*36)as integer) for 1)
||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
cast((random()*36)as integer) for 1)
    AS pincode
) AS pcodetbl;


generate_series gives 1M rows
Now that finishes fine - uses about 700MB of RAM (VSZ) but works.

If I change the generate_series to 10M rows it gets an out of memory
error at about 3.5GB (VSZ) and a bit under 300MB(RSS)
(this is from the client connection process not the writer etc)

from ps aux just before ending -
USER       PID %CPU %MEM      VSZ    RSS  TT  STAT STARTED      TIME COMMAND
pgsql    14519  32.5 -13.8  3366412 290064  ??  Rs    3:39AM   2:22.20
postgres: pgsql postgres ::1(64645) INSERT



OK so it hits the 32 bit limit and it quits gracefully and doesn't cause
any real problem but I seem to think that the memory being allocated is
somewhat over-sized, unless I'm just missing something?

I figure that the subselect is built up in RAM then fed into the INSERT.
 From what I add up, with 7 characters per row plus 4 for the sequence
and a few extra overheads, I would think less than 20 bytes per row * 1M
rows makes about 20MB, with 10M rows it goes to 200MB

That falls a long way short of what is being allocated to pg.


For ref :-

postgres=# select version();

version
--------------------------------------------------------------------
  PostgreSQL 8.2.5 on powerpc-apple-darwin8.10.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build
5367)
(1 row)

Mac OSX 10.4.11
G4 dual 1.25G
2GB RAM




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: out of memory with INSERT INTO... SELECT...

От
Tom Lane
Дата:
Shane Ambler <pgsql@Sheeky.Biz> writes:
>    SELECT generate_series(1,1000000) AS idx
> , substring('abcdefghijklmnopqrstuvwxyz0123456789' from
> cast((random()*36)as integer) for 1)
> ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
> cast((random()*36)as integer) for 1)
> ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
> cast((random()*36)as integer) for 1)
> ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
> cast((random()*36)as integer) for 1)
> ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
> cast((random()*36)as integer) for 1)
> ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
> cast((random()*36)as integer) for 1)
> ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
> cast((random()*36)as integer) for 1)
>     AS pincode

> If I change the generate_series to 10M rows it gets an out of memory
> error at about 3.5GB (VSZ) and a bit under 300MB(RSS)

Seems to be the same issue recently discussed here:
http://archives.postgresql.org/pgsql-performance/2008-01/msg00031.php

For the moment I'd suggest recasting it to avoid having the SRF in the
SELECT target list (which is pretty darn weird anyway, in this usage
--- I don't see any very good SQL-semantics argument why the substring
expression would get evaluated more than once here).  Something like

INSERT INTO codes
   SELECT
     substring('abcdefghijklmnopqrstuvwxyz0123456789' from
     cast((random()*36)as integer) for 1)
     ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
     cast((random()*36)as integer) for 1)
     ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
     cast((random()*36)as integer) for 1)
     ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
     cast((random()*36)as integer) for 1)
     ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
     cast((random()*36)as integer) for 1)
     ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
     cast((random()*36)as integer) for 1)
     ||substring('abcdefghijklmnopqrstuvwxyz0123456789' from
     cast((random()*36)as integer) for 1)
   FROM generate_series(1,1000000) AS idx;

            regards, tom lane