Обсуждение: Memory-leak in BackgroundWriter(and Checkpointer)

Поиск
Список
Период
Сортировка

Memory-leak in BackgroundWriter(and Checkpointer)

От
Naoya Anzai
Дата:
Hi All.

I've found a memory-leak bug in PostgreSQL 9.1.9's background
writer process.

When following parameter is set, it occurs every time CHECKPOINT runs.

 wal_level = hot_standby

I've also confirmed REL9_1_STABLE and it is not fixed yet.

In additional, it also occurs in 9.3.beta1's checkpointer.

I created a patch (attached file) for 9.1.9,
so could anyone confirm?

Best regards.
---
Naoya Anzai
NEC Soft,Ltd.
http://www.necsoft.com/eng/

Вложения

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Stephen Frost
Дата:
* Naoya Anzai (anzai-naoya@mxu.nes.nec.co.jp) wrote:
> I've found a memory-leak bug in PostgreSQL 9.1.9's background=20
> writer process.

This looks legit, but probably not the right approach to fixing it.
Looks like it'd be better to work out a way to use a static variable to
reuse the same memory, ala what GetRunningTransactionData() does, and
avoid having to do allocation while holding all the locks (or at least,
not very often).

    Thanks,

        Stephen

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Heikki Linnakangas
Дата:
On 04.06.2013 15:27, Stephen Frost wrote:
> * Naoya Anzai (anzai-naoya@mxu.nes.nec.co.jp) wrote:
>> I've found a memory-leak bug in PostgreSQL 9.1.9's background
>> writer process.
>
> This looks legit, but probably not the right approach to fixing it.
> Looks like it'd be better to work out a way to use a static variable to
> reuse the same memory, ala what GetRunningTransactionData() does, and
> avoid having to do allocation while holding all the locks (or at least,
> not very often).

I can't get too excited about the overhead of a single palloc here. It's
a fairly heavy operation anyway, and only runs once per checkpoint. And
we haven't heard any actual complaints of latency hiccups with
wal_level=hot_standby.

- Heikki

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Andres Freund
Дата:
On 2013-06-04 16:23:00 +0300, Heikki Linnakangas wrote:
> On 04.06.2013 15:27, Stephen Frost wrote:
> >* Naoya Anzai (anzai-naoya@mxu.nes.nec.co.jp) wrote:
> >>I've found a memory-leak bug in PostgreSQL 9.1.9's background
> >>writer process.
> >
> >This looks legit, but probably not the right approach to fixing it.
> >Looks like it'd be better to work out a way to use a static variable to
> >reuse the same memory, ala what GetRunningTransactionData() does, and
> >avoid having to do allocation while holding all the locks (or at least,
> >not very often).
>
> I can't get too excited about the overhead of a single palloc here. It's a
> fairly heavy operation anyway, and only runs once per checkpoint. And we
> haven't heard any actual complaints of latency hiccups with
> wal_level=hot_standby.

I think we will have to resort to running it more frequently in the not
too far away future, its way to easy to get into a situation where all
of the checkpoints/xl_running_xact's are suboverflowed delaying
consistency considerably.

Seems more consistent with the rest of the code too. But anyway, I am
fine with fixing it either way.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Stephen Frost
Дата:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2013-06-04 16:23:00 +0300, Heikki Linnakangas wrote:
> > I can't get too excited about the overhead of a single palloc here. It's a
> > fairly heavy operation anyway, and only runs once per checkpoint. And we
> > haven't heard any actual complaints of latency hiccups with
> > wal_level=hot_standby.

Perhaps, but we don't always hear about things that we should- this long
standing memory leak being one of them.

> Seems more consistent with the rest of the code too. But anyway, I am
> fine with fixing it either way.

And this is really the other point- having LogStandbySnapshot() need to
clean up after GetRunningTransactionLocks() but not
GetRunningTransactionData() would strike me as very odd.

    Thanks,

        Stephen

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Andres Freund (andres@2ndquadrant.com) wrote:
>> Seems more consistent with the rest of the code too. But anyway, I am
>> fine with fixing it either way.

> And this is really the other point- having LogStandbySnapshot() need to
> clean up after GetRunningTransactionLocks() but not
> GetRunningTransactionData() would strike me as very odd.

Meh.  I'm not impressed with permanently allocating an array large
enough to hold all the locks GetRunningTransactionLocks
might return --- that's potentially much larger than the other array,
and in fact I don't think we have a hard limit on its size at all.
Besides which, it's not like there is *no* cleanup for
GetRunningTransactionData --- it has a lock that has to be released ...

I think the proposed fix is fine code-wise; the real problem here is
crummy commenting.  GetRunningTransactionLocks isn't documented as
returning a palloc'd array, and why the heck do we have a long comment
about its implementation in LogStandbySnapshot?

            regards, tom lane

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Stephen Frost
Дата:
* Andres Freund (andres@2ndquadrant.com) wrote:
> Seems more consistent with the rest of the code too. But anyway, I am
> fine with fixing it either way.

Patch attached which mirrors what GetRunningTransactionData() (the other
function called from LogStandbySnapshot) does, more-or-less- uses a
static variable to keep track of memory allocated for the data being
returned to the caller.

Looks like this will need to be back-patched to 9.0.  Barring objections
or better ideas, I'll do that in a few hours.

    Thanks,

        Stephen

Вложения

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Meh.  I'm not impressed with permanently allocating an array large
> enough to hold all the locks GetRunningTransactionLocks
> might return --- that's potentially much larger than the other array,
> and in fact I don't think we have a hard limit on its size at all.

Well, sure, which is why I didn't actually do that- but I did end up
having to make it resize when necessary, which isn't entirely ideal
either.

> Besides which, it's not like there is *no* cleanup for
> GetRunningTransactionData --- it has a lock that has to be released ...

That's true..  I guess my general feeling is that it'd be good to do
this all one way or the other- having it use a static variable into
which we stick the pointer to some reused space for one and then doing a
palloc for the other which needs to be pfree'd struck me as odd.

> I think the proposed fix is fine code-wise; the real problem here is
> crummy commenting.  GetRunningTransactionLocks isn't documented as
> returning a palloc'd array, and why the heck do we have a long comment
> about its implementation in LogStandbySnapshot?

Certainly good questions and better comments would have helped here.  I
can go back and rework the patch either way.

    Thanks,

        Stephen

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I think the proposed fix is fine code-wise; the real problem here is
>> crummy commenting.  GetRunningTransactionLocks isn't documented as
>> returning a palloc'd array, and why the heck do we have a long comment
>> about its implementation in LogStandbySnapshot?

> Certainly good questions and better comments would have helped here.  I
> can go back and rework the patch either way.

I'm already working on back-patching the attached.

            regards, tom lane

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 615278b8ca2adf3057e5f21057c3cedfc66480aa..c704412366d5bc4b6512e9ab673855c46f7d993f 100644
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
*************** LogStandbySnapshot(void)
*** 865,880 ****

      /*
       * Get details of any AccessExclusiveLocks being held at the moment.
-      *
-      * XXX GetRunningTransactionLocks() currently holds a lock on all
-      * partitions though it is possible to further optimise the locking. By
-      * reference counting locks and storing the value on the ProcArray entry
-      * for each backend we can easily tell if any locks need recording without
-      * trying to acquire the partition locks and scanning the lock table.
       */
      locks = GetRunningTransactionLocks(&nlocks);
      if (nlocks > 0)
          LogAccessExclusiveLocks(nlocks, locks);

      /*
       * Log details of all in-progress transactions. This should be the last
--- 865,875 ----

      /*
       * Get details of any AccessExclusiveLocks being held at the moment.
       */
      locks = GetRunningTransactionLocks(&nlocks);
      if (nlocks > 0)
          LogAccessExclusiveLocks(nlocks, locks);
+     pfree(locks);

      /*
       * Log details of all in-progress transactions. This should be the last
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 8cd871f4b40d7bfc8b0aaa7650241d5865c79ffe..273c72230274b46ba6a46bb8b295ef5375aaa36d 100644
*** a/src/backend/storage/lmgr/lock.c
--- b/src/backend/storage/lmgr/lock.c
*************** GetLockStatusData(void)
*** 3398,3415 ****
  }

  /*
!  * Returns a list of currently held AccessExclusiveLocks, for use
!  * by GetRunningTransactionData().
   */
  xl_standby_lock *
  GetRunningTransactionLocks(int *nlocks)
  {
      PROCLOCK   *proclock;
      HASH_SEQ_STATUS seqstat;
      int            i;
      int            index;
      int            els;
-     xl_standby_lock *accessExclusiveLocks;

      /*
       * Acquire lock on the entire shared lock data structure.
--- 3398,3423 ----
  }

  /*
!  * Returns a list of currently held AccessExclusiveLocks, for use by
!  * LogStandbySnapshot().  The result is a palloc'd array,
!  * with the number of elements returned into *nlocks.
!  *
!  * XXX This currently takes a lock on all partitions of the lock table,
!  * but it's possible to do better.  By reference counting locks and storing
!  * the value in the ProcArray entry for each backend we could tell if any
!  * locks need recording without having to acquire the partition locks and
!  * scan the lock table.  Whether that's worth the additional overhead
!  * is pretty dubious though.
   */
  xl_standby_lock *
  GetRunningTransactionLocks(int *nlocks)
  {
+     xl_standby_lock *accessExclusiveLocks;
      PROCLOCK   *proclock;
      HASH_SEQ_STATUS seqstat;
      int            i;
      int            index;
      int            els;

      /*
       * Acquire lock on the entire shared lock data structure.
*************** GetRunningTransactionLocks(int *nlocks)
*** 3467,3472 ****
--- 3475,3482 ----
          }
      }

+     Assert(index <= els);
+
      /*
       * And release locks.  We do this in reverse order for two reasons: (1)
       * Anyone else who needs more than one of the locks will be trying to lock

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I'm already working on back-patching the attached.

Works for me,

    Thanks!

        Stephen

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Naoya Anzai
Дата:
Tom, Stephen, Heikki and Andres,

You do FAST work!
Thanks for your prompt responses.

---
Naoya Anzai
NEC Soft,Ltd.
http://www.necsoft.com/eng/

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Pawel Kozik
Дата:
Any idea when it will be available in official PostgreSQL release  9.1.x or
9.2.x ?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Memory-leak-in-BackgroundWriter-and-Checkpointer-tp5757869p5758783.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: Memory-leak in BackgroundWriter(and Checkpointer)

От
Stephen Frost
Дата:
* Pawel Kozik (pawel.kozik@alcatel-lucent.com) wrote:
> Any idea when it will be available in official PostgreSQL release  9.1.x or
> 9.2.x ?

Yes, the next set of point releases should include Tom's patch to fix
this leak.

    Thanks,

        Stephen