Обсуждение: dblink() cursor error/issue (TopMemoryContext)

Поиск
Список
Период
Сортировка

dblink() cursor error/issue (TopMemoryContext)

От
"Henry"
Дата:
Hello all,

I'm trying to code a function to copy rows from one machine to another
using dblink and cursors:

...
perform dblink_connect ('dbname=db1...host=othermachine.com');
perform dblink_open ('cur_other1', 'SELECT col1 FROM tab1');

loop
  fnd := 0;
  for rec in
      -- grab a 1000 rows at a time
      SELECT col1 FROM dblink_fetch ('cur_other1', 1000)
      AS tab1 (col1 text)
  loop
      begin
          INSERT INTO tab1 (col1) VALUES (rec.col1);
          ...
      exception when unique_violation then
         -- ignore dups
      end;
      fnd := 1
  end loop;
  if fnd = 0 then
     exit;
  end if;
end loop;

perform dblink_close ('cur_other1');
perform dblink_disconnect();


This runs fine for a while, then starts vomiting:

TopMemoryContext: 44175408 total in 5388 blocks; 94224 free (5394 chunks);
44081184 used
SPI Plan: 3072 total in 2 blocks; 2000 free (0 chunks); 1072 used
SPI Plan: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
SPI Plan: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used
SPI Plan: 3072 total in 2 blocks; 1328 free (0 chunks); 1744 used
SPI Plan: 7168 total in 3 blocks; 3896 free (0 chunks); 3272 used
SPI Plan: 7168 total in 3 blocks; 3896 free (0 chunks); 3272 used
SPI Plan: 1024 total in 1 blocks; 256 free (0 chunks); 768 used
SPI Plan: 1024 total in 1 blocks; 256 free (0 chunks); 768 used
SPI Plan: 7168 total in 3 blocks; 1504 free (0 chunks); 5664 used
PL/PgSQL function context: 8192 total in 1 blocks; 6928 free (5 chunks);
1264 used
SPI Plan: 3072 total in 2 blocks; 1808 free (0 chunks); 1264 used
SPI Plan: 1024 total in 1 blocks; 96 free (0 chunks); 928 used
SPI Plan: 3072 total in 2 blocks; 1664 free (0 chunks); 1408 used
SPI Plan: 3072 total in 2 blocks; 1312 free (0 chunks); 1760 used
PL/PgSQL function context: 24576 total in 2 blocks; 12112 free (10
chunks); 12464 used
SPI Plan: 15360 total in 4 blocks; 7640 free (0 chunks); 7720 used
Operator class cache: 8192 total in 1 blocks; 4872 free (0 chunks); 3320 used
Record information cache: 8192 total in 1 blocks; 1800 free (0 chunks);
6392 used
SPI Plan: 3072 total in 2 blocks; 1576 free (0 chunks); 1496 used
SPI Plan: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
SPI Plan: 3072 total in 2 blocks; 1760 free (0 chunks); 1312 used
SPI Plan: 3072 total in 2 blocks; 1856 free (0 chunks); 1216 used
SPI Plan: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
SPI Plan: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
SPI Plan: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
SPI Plan: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
PL/PgSQL function context: 24576 total in 2 blocks; 7784 free (16 chunks);
16792 used
CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
Rendezvous variable hash: 8192 total in 1 blocks; 3848 free (0 chunks);
4344 used
PLpgSQL function cache: 24596 total in 2 blocks; 5904 free (0 chunks);
18692 used
TopTransactionContext: 8380416 total in 10 blocks; 3213936 free (0
chunks); 5166480 used
SPI Exec: 8192 total in 1 blocks; 7992 free (0 chunks); 200 used
ExecutorState: 8192 total in 1 blocks; 3080 free (0 chunks); 5112 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
SPI Proc: 8192 total in 1 blocks; 6520 free (5 chunks); 1672 used
CurTransactionContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AfterTriggerEvents: 8192 total in 1 blocks; 8136 free (0 chunks); 56 used
ExecutorState: 24576 total in 2 blocks; 4472 free (4 chunks); 20104 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (5 chunks); 16 used

...

repeat above until 150GB+ logfile, then ctrl-c

On the tty where I've called the function, after hitting ctrl-c, I get:

...
ERROR:  out of memory
DETAIL:  Failed on request of size 1291220.
...
PANIC:  ERRORDATA_STACK_SIZE exceeded
...

I'm trying to use cursors so that I don't run out of memory - yet I seem
to be running out of memory anyway.

Doing this the other way round works OK:

perform dblink_connect('dbname=db1...host=othermachine.com');
for rec in
        SELECT col1 FROM tab1  -- this uses cursors in function auto'ally
loop
    perform dblink_exec ('INSERT INTO tab1 ..'||rec.col1||'...');
    ...
end loop;
perform dblink_disconnect();
...


I must be doing something stupid here.

Any comments are welcome.

Regards
Henry


Re: dblink() cursor error/issue (TopMemoryContext)

От
Tom Lane
Дата:
"Henry" <henry@zen.co.za> writes:
> I'm trying to code a function to copy rows from one machine to another
> using dblink and cursors:

What PG version is this, exactly?

> perform dblink_connect ('dbname=db1...host=othermachine.com');
> perform dblink_open ('cur_other1', 'SELECT col1 FROM tab1');

> loop
>   fnd := 0;
>   for rec in
>       -- grab a 1000 rows at a time
>       SELECT col1 FROM dblink_fetch ('cur_other1', 1000)
>       AS tab1 (col1 text)
>   loop
>       begin
>           INSERT INTO tab1 (col1) VALUES (rec.col1);
>           ...
>       exception when unique_violation then
>          -- ignore dups
>       end;
>       fnd := 1
>   end loop;
>   if fnd = 0 then
>      exit;
>   end if;
> end loop;

> perform dblink_close ('cur_other1');
> perform dblink_disconnect();

I don't think your problem has anything to do with dblink per se.
The repeated begin/exception blocks are apparently managing to leak
some memory per iteration.  I can't tell whether this represents
a known (and perhaps already fixed) bug; it very likely depends on
details you haven't shown us.  Do you want to try to put together a
self-contained test case?  (Again, you likely don't need dblink to
exhibit the issue.)

If you just want something that works now, try probing for an existing
entry before inserting, instead of relying on catching an exception.

            regards, tom lane

Re: dblink() cursor error/issue (TopMemoryContext)

От
"Henry"
Дата:
On Mon, June 2, 2008 6:53 pm, Tom Lane wrote:
> "Henry" <henry@zen.co.za> writes:
>> I'm trying to code a function to copy rows from one machine to another
>> using dblink and cursors:
>
> What PG version is this, exactly?

Arg, dammit.  Sorry, it's version 8.2.6 (where the function is running),
talking to a remote machine running 8.3.1.

> I don't think your problem has anything to do with dblink per se.
> The repeated begin/exception blocks are apparently managing to leak
> some memory per iteration.  I can't tell whether this represents
> a known (and perhaps already fixed) bug; it very likely depends on
> details you haven't shown us.  Do you want to try to put together a
> self-contained test case?  (Again, you likely don't need dblink to
> exhibit the issue.)

OK, will give this a swing; but I'll first upgrade to 8.3.1 and see what
that does.

> If you just want something that works now, try probing for an existing
> entry before inserting, instead of relying on catching an exception.

ok, but that's going to be slow considering I'm inserting almost a hundred
million rows.

My previous post wasn't entirely accurate:  the dblink_exec() call in a
loop is also enclosed in a begin/exception/end block, yet it doesn't fail.
 The only difference besides the dblink_exec call itself, is that I'm
using 'others' to catch any error from dblink_exec (using 'others' on the
first insert results in the same issue).




Re: dblink() cursor error/issue (TopMemoryContext)

От
"Henry - Zen Search SA"
Дата:
On Mon, June 2, 2008 6:53 pm, Tom Lane wrote:
> I don't think your problem has anything to do with dblink per se.
> The repeated begin/exception blocks are apparently managing to leak
> some memory per iteration.  I can't tell whether this represents
> a known (and perhaps already fixed) bug; it very likely depends on
> details you haven't shown us.  Do you want to try to put together a
> self-contained test case?

Sorry for the delay in responding to this.

It looks like there's definitely a memory leak problem in 8.2.6 when
executing begin/exception in a loop.

After upgrading to 8.3.1, the same code ran to conclusion without error.

One other thing:  the docs mention that functions use cursors
automatically to prevent OOM errors on large selects (as in my case).
Well, the second part of my function does this:

FOR rec in SELECT * FROM bigtable
LOOP
   ...begin/insert/exception...
END LOOP;

and bang, OOM.  This is in 8.3.1.  I'll rewrite this to use cursors, but
was hoping to take advantage of the implicit cursors to keep the code nice
and simple... or am I misunderstanding "...FOR loops automatically use a
cursor internally to avoid memory problems." from section 37.8 in the
manual?

Regards
Henry


Re: dblink() cursor error/issue (TopMemoryContext)

От
Tom Lane
Дата:
"Henry - Zen Search SA" <henry@zen.co.za> writes:
> One other thing:  the docs mention that functions use cursors
> automatically to prevent OOM errors on large selects (as in my case).
> Well, the second part of my function does this:

> FOR rec in SELECT * FROM bigtable
> LOOP
>    ...begin/insert/exception...
> END LOOP;

> and bang, OOM.

How soon is "bang"?  The above causes one subtransaction to be
instantiated per loop execution, since we have to have a new XID
for each inserted row (else it's not possible to roll back just
that row on failure).  The memory overhead per subtransaction is
not zero, though I think it's fairly small if you don't have any
triggers pending as a result of the insert.  (Hm ... any foreign
keys on the table being inserted into?)

> This is in 8.3.1.  I'll rewrite this to use cursors, but
> was hoping to take advantage of the implicit cursors to keep the code nice
> and simple... or am I misunderstanding "...FOR loops automatically use a
> cursor internally to avoid memory problems." from section 37.8 in the
> manual?

The FOR loop is not your problem.

            regards, tom lane

Re: dblink() cursor error/issue (TopMemoryContext)

От
"Henry - Zen Search SA"
Дата:
On Fri, June 13, 2008 7:05 pm, Tom Lane wrote:
> How soon is "bang"?

I'll run it again and post back.

> The memory overhead per subtransaction is
> not zero, though I think it's fairly small if you don't have any
> triggers pending as a result of the insert.

Two triggers are fired for each insert (before and after trigs).

> (Hm ... any foreign keys on the table being inserted into?)

Not on the table being inserted into, but it has a couple of tables with
foreign keys referencing it.


Re: dblink() cursor error/issue (TopMemoryContext)

От
Tom Lane
Дата:
"Henry - Zen Search SA" <henry@zen.co.za> writes:
> On Fri, June 13, 2008 7:05 pm, Tom Lane wrote:
>> The memory overhead per subtransaction is
>> not zero, though I think it's fairly small if you don't have any
>> triggers pending as a result of the insert.

> Two triggers are fired for each insert (before and after trigs).

The after trigger is probably the main source of the problem ---
have you any way of avoiding that?

(I wonder whether we can't improve on that, at least for the case of a
non-deferred after trigger, which'd have already been fired before we
exit the subtransaction.  The trick is how to know that there's nothing
useful left in the subtransaction's per-transaction memory context ...)

            regards, tom lane

Re: dblink() cursor error/issue (TopMemoryContext)

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> (I wonder whether we can't improve on that, at least for the case of a
> non-deferred after trigger, which'd have already been fired before we
> exit the subtransaction.  The trick is how to know that there's nothing
> useful left in the subtransaction's per-transaction memory context ...)

I think this would require us to be smarter about what pfree's are
leaving the context empty.  Whether this requires us to troll the
freelist and aggregate contiguous elements, or something even more
complex, I don't know.  The fact that this adds a bit more complexity to
pfree() worries me.

Perhaps we could have something like MemoryContextDefrag that we would
invoke at subtransaction end.  If the context is truly empty it can
leave it in the initial empty state, which I think is enough for the
subxact code to know that it can remove it.

Unless I'm misunderstanding the issue completely.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: dblink() cursor error/issue (TopMemoryContext)

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> (I wonder whether we can't improve on that, at least for the case of a
>> non-deferred after trigger, which'd have already been fired before we
>> exit the subtransaction.  The trick is how to know that there's nothing
>> useful left in the subtransaction's per-transaction memory context ...)

> I think this would require us to be smarter about what pfree's are
> leaving the context empty.  Whether this requires us to troll the
> freelist and aggregate contiguous elements, or something even more
> complex, I don't know.  The fact that this adds a bit more complexity to
> pfree() worries me.

Yeah, slowing down pfree is not a good idea.  The other problem is that
random "leaks" into CurTransactionContext would defeat the optimization.
(In this connection it's scary that xact.c initializes
CurrentMemoryContext to CurTransactionContext at subtransaction start.)

The idea I was toying with was to require the subsystems that
intentionally keep state in subtransaction contexts to Do Something at
subtransaction commit --- either proactively copy that state up to the
parent context (then we could remove the subtransaction context), or
report whether they have anything keeping the subxact context from being
freed.  There are only about three subsystems that have this issue
(sinval, triggers, notify, IIRC).  We might need to combine these ideas:
sometimes there might be more state there than we really want to copy,
but we don't want to keep the whole context around for, say, one trigger
record.

            regards, tom lane