Обсуждение: warm standby server stops doing checkpoints after a while

Поиск

Список

Период

Сортировка

warm standby server stops doing checkpoints after a while

От

Frank Wittig

Дата:

31 мая 2007 г., 10:58:55

Hello list,

I have a problem regarding running a warm standby server as described in
the postgresql 8.2 documentation.

I set up two servers. Both running PostgreSQL 8.2.3-1PGDG on Fedora Core
6 (x86_64). (Master driven by AMD Opteron / Slave running on Intel Xeon)

The master server copys its WAL-logs to the slave using rsync over SSH
which are then replayed by the slave using a little perl script which
does not really do much more then the pseudo code in the documentation does.
To this point everything works just fine.

To reduce space consumtion of the log archive my restore script checks
pg_controldata when it enters a wait cycle and deletes all log files
which are older than the last checkpoint segment. (Checkpointing happens
about every 5 to 10 minutes)

The problem is that the slave server stops checkpointing after some
hours of working (about 24 to 48 hours of conitued log replay).
After this point in time the slave does not do checkpoints anymore but
still replayes every log that it gets from the master. The master works
on as expected doing checkpoints every 5 to 10 minutes.

I raised loglevel to debug5 but there's nothing interresting in the
logs. Just log replay and the fact that there's absolutely no sign of
anything related to checkpoints.

Restart does not help. Only beginning with a new base backup brings the
slave to work again as expected.

Both servers are configured identically except archiving/recovery settings.

Does somebody has a clue what could cause this behavior of the slave server?
Could upgrading to 8.2.4 help? - I didn't find something related in the
release notes.
Could this be a bug?

Greetings,
Frank Wittig

Вложения

signature.asc

Re: warm standby server stops doing checkpoints after a while

От

Tom Lane

Дата:

31 мая 2007 г., 14:23:52

Frank Wittig <fw@weisshuhn.de> writes:
> The problem is that the slave server stops checkpointing after some
> hours of working (about 24 to 48 hours of conitued log replay).

Hm ... look at RecoveryRestartPoint() in xlog.c.  Could there be
something wrong with this logic?

    /*
     * Do nothing if the elapsed time since the last restartpoint is less than
     * half of checkpoint_timeout.    (We use a value less than
     * checkpoint_timeout so that variations in the timing of checkpoints on
     * the master, or speed of transmission of WAL segments to a slave, won't
     * make the slave skip a restartpoint once it's synced with the master.)
     * Checking true elapsed time keeps us from doing restartpoints too often
     * while rapidly scanning large amounts of WAL.
     */
    elapsed_secs = time(NULL) - ControlFile->time;
    if (elapsed_secs < CheckPointTimeout / 2)
        return;

The idea is that the slave (once in sync with the master) ought to
checkpoint every time it sees a checkpoint record in the master's
output.  I'm not seeing a flaw but maybe there is one here, or somewhere
nearby.  Are you sure the master is checkpointing?

            regards, tom lane

Re: warm standby server stops doing checkpoints after a while

От

Frank Wittig

Дата:

31 мая 2007 г., 14:49:18

Tom Lane schrieb:

> Are you sure the master is checkpointing?
Yes. I double checked using pg_controldata on the master.
On both servers checkpoint_segments is set to 16 and checkpoint_timeout
is 300 seconds default.
There were two messages in the logs of the master that checkpointing
happened too often during bulk imports last night.
It's possible that the problem occures during periods if high database
activity where checkpointing on the master happens very often.
It would be ok if the slave skipped checkpoints if its falling behind
but i'd expect it to checkpoint again if it's in sync again.

I watched the servers a while after it happend today and found that the
slave catched up waiting for new logs. But it didnt do any checkpointing.
The last checkpoint was over 2 hours and about 500 log segments ago.

I upgraded the slave to 8.2.4-1PGDG today and will upgrade the master
too as soon as I can get me a maintenance window for this.

Greetings,
  Frank Wittig

Вложения

signature.asc

Re: warm standby server stops doing checkpoints after awhile

От

"Simon Riggs"

Дата:

01 июня 2007 г., 11:00:25

On Thu, 2007-05-31 at 10:23 -0400, Tom Lane wrote:
> Frank Wittig <fw@weisshuhn.de> writes:
> > The problem is that the slave server stops checkpointing after some
> > hours of working (about 24 to 48 hours of conitued log replay).
>
> Hm ... look at RecoveryRestartPoint() in xlog.c.  Could there be
> something wrong with this logic?
>
>     /*
>      * Do nothing if the elapsed time since the last restartpoint is less than
>      * half of checkpoint_timeout.    (We use a value less than
>      * checkpoint_timeout so that variations in the timing of checkpoints on
>      * the master, or speed of transmission of WAL segments to a slave, won't
>      * make the slave skip a restartpoint once it's synced with the master.)
>      * Checking true elapsed time keeps us from doing restartpoints too often
>      * while rapidly scanning large amounts of WAL.
>      */
>     elapsed_secs = time(NULL) - ControlFile->time;
>     if (elapsed_secs < CheckPointTimeout / 2)
>         return;
>
> The idea is that the slave (once in sync with the master) ought to
> checkpoint every time it sees a checkpoint record in the master's
> output.  I'm not seeing a flaw but maybe there is one here, or somewhere
> nearby.  Are you sure the master is checkpointing?

Hmmm. This can happen if a backend crashes while half-way through any
set of changes that causes safe_restartpoint() to be true. Or it might
be that one of the Index AMs don't correctly clear the multi-WAL actions
in some corner cases.

Or it could be that the mdsync looping problem has been worse than we
thought and checkpoints have been avoided completely for some time.

Frank,

This is repeatable, yes?
Has anything crashed on your server?
Are you using GIN or GIST indexes?

I'll look at putting some debug information in there that logs whether
multi-WAL actions remain unresolved for any length of time.

Continuing to think about this one....

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: warm standby server stops doing checkpoints after awhile

От

Frank Wittig

Дата:

01 июня 2007 г., 11:33:51

Simon Riggs schrieb:

> This is repeatable, yes?
Yes, it occures every time I begin with a new base backup. And it seem
to happen during recreation of tsearch2 vectors of large amounts of data
sets.

> Has anything crashed on your server?
No. Crashes didn't occur duriung that times.

> Are you using GIN or GIST indexes?
I'm using GIN indesex on tsearch2 vectors of very large ammount of data
sets. (About 3,8 million data sets of which about 30-50 thousands are
recreated and indexed when the descibed behavior occures.)

> I'll look at putting some debug information in there that logs whether
> multi-WAL actions remain unresolved for any length of time.
Extra debug info would be great.
I tested myself adding some debug output into the function Tom Lane
mentioned and found that after the server stopped checkpointing every
time the function is called it exits at this point:

  /*
   * Is it safe to checkpoint?  We must ask each of the resource managers
   * whether they have any partial state information that might prevent a
   * correct restart from this point.  If so, we skip this opportunity, but
   * return at the next checkpoint record for another try.
   */
  for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
  {
    if (RmgrTable[rmid].rm_safe_restartpoint != NULL)
      if (!(RmgrTable[rmid].rm_safe_restartpoint()))
        return;
  }

It exits every time with the same value for rmid.
Logs look like this (The quoted lines repeat):

<2007-06-01 13:10:28.936 CEST:%> DEBUG:  00000: executing restore
command "/var/lib/pgsql/restore.pl
/mnt/wal_archive/00000001000000C9000000C2 pg_xlog/RECOVERYXLOG"
<2007-06-01 13:10:28.936 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2474
<2007-06-01 13:11:29.055 CEST:%> LOG:  00000: restored log file
"00000001000000C9000000C2" from archive
<2007-06-01 13:11:29.055 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2504
<2007-06-01 13:11:29.364 CEST:%> DEBUG:  00000: found Checkpoint in XLOG
<2007-06-01 13:11:29.364 CEST:%> CONTEXT:  xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.364 CEST:%> LOCATION:  RecoveryRestartPoint,
xlog.c:5739
<2007-06-01 13:11:29.365 CEST:%> DEBUG:  00000: Ressource manager (13)
has partial state information
<2007-06-01 13:11:29.365 CEST:%> CONTEXT:  xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.365 CEST:%> LOCATION:  RecoveryRestartPoint,
xlog.c:5769

best regards,
  Frank Wittig

Вложения

signature.asc

Re: warm standby server stops doing checkpoints afterawhile

От

"Simon Riggs"

Дата:

01 июня 2007 г., 11:46:20

On Fri, 2007-06-01 at 13:33 +0200, Frank Wittig wrote:
> Simon Riggs schrieb:
>
> > This is repeatable, yes?
> Yes, it occures every time I begin with a new base backup. And it seem
> to happen during recreation of tsearch2 vectors of large amounts of data
> sets.
>
> > Has anything crashed on your server?
> No. Crashes didn't occur duriung that times.
>
> > Are you using GIN or GIST indexes?
> I'm using GIN indesex on tsearch2 vectors of very large ammount of data
> sets. (About 3,8 million data sets of which about 30-50 thousands are
> recreated and indexed when the descibed behavior occures.)
>
> > I'll look at putting some debug information in there that logs whether
> > multi-WAL actions remain unresolved for any length of time.
> Extra debug info would be great.
> I tested myself adding some debug output into the function Tom Lane
> mentioned and found that after the server stopped checkpointing every
> time the function is called it exits at this point:
>
>   /*
>    * Is it safe to checkpoint?  We must ask each of the resource managers
>    * whether they have any partial state information that might prevent a
>    * correct restart from this point.  If so, we skip this opportunity, but
>    * return at the next checkpoint record for another try.
>    */
>   for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
>   {
>     if (RmgrTable[rmid].rm_safe_restartpoint != NULL)
>       if (!(RmgrTable[rmid].rm_safe_restartpoint()))
>         return;
>   }
>
> It exits every time with the same value for rmid.
> Logs look like this (The quoted lines repeat):
>
> <2007-06-01 13:10:28.936 CEST:%> DEBUG:  00000: executing restore
> command "/var/lib/pgsql/restore.pl
> /mnt/wal_archive/00000001000000C9000000C2 pg_xlog/RECOVERYXLOG"
> <2007-06-01 13:10:28.936 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2474
> <2007-06-01 13:11:29.055 CEST:%> LOG:  00000: restored log file
> "00000001000000C9000000C2" from archive
> <2007-06-01 13:11:29.055 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2504
> <2007-06-01 13:11:29.364 CEST:%> DEBUG:  00000: found Checkpoint in XLOG
> <2007-06-01 13:11:29.364 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
> offset 15; online
> <2007-06-01 13:11:29.364 CEST:%> LOCATION:  RecoveryRestartPoint,
> xlog.c:5739
> <2007-06-01 13:11:29.365 CEST:%> DEBUG:  00000: Ressource manager (13)
> has partial state information
> <2007-06-01 13:11:29.365 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
> offset 15; online
> <2007-06-01 13:11:29.365 CEST:%> LOCATION:  RecoveryRestartPoint,
> xlog.c:5769

To me, this points clearly to there being an improperly completed action
in resource manager 13. (GIN) In summary, it appears that there may be
an issue with the GIN code for WAL recovery and this is effecting the
Warm Standby.

I'll reply again soon with a patch to allow data of incomplete splits to
be logged so we can see what's going on. After that we should continue
to discuss this on -hackers.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: warm standby server stops doing checkpoints afterawhile

От

Teodor Sigaev

Дата:

01 июня 2007 г., 12:58:18

>> <2007-06-01 13:11:29.365 CEST:%> DEBUG:  00000: Ressource manager (13)
>> has partial state information
> To me, this points clearly to there being an improperly completed action
> in resource manager 13. (GIN) In summary, it appears that there may be
> an issue with the GIN code for WAL recovery and this is effecting the
> Warm Standby.

Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is it
possible reason of bug?

Attached patch fixes it.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
*** ./src/backend/access/gin/ginxlog.c.orig    Fri Jun  1 16:47:47 2007
--- ./src/backend/access/gin/ginxlog.c    Fri Jun  1 16:53:47 2007
***************
*** 594,599 ****
--- 594,600 ----

      MemoryContextSwitchTo(topCtx);
      MemoryContextDelete(opCtx);
+     incomplete_splits = NIL;
  }

  bool

Re: warm standby server stops doing checkpoints afterawhile

От

Frank Wittig

Дата:

01 июня 2007 г., 13:21:43

Teodor Sigaev schrieb:
> Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list.
> Is it possible reason of bug?

Sounds reasonable to me. Would explain why the system didn't ever
recover from that state.
I'll test your patch and report results on this list.

Greetings,
  Frank Wittig

Вложения

signature.asc

Re: warm standby server stops doing checkpointsafterawhile

От

"Simon Riggs"

Дата:

01 июня 2007 г., 13:40:47

On Fri, 2007-06-01 at 16:58 +0400, Teodor Sigaev wrote:
> >> <2007-06-01 13:11:29.365 CEST:%> DEBUG:  00000: Ressource manager (13)
> >> has partial state information
> > To me, this points clearly to there being an improperly completed action
> > in resource manager 13. (GIN) In summary, it appears that there may be
> > an issue with the GIN code for WAL recovery and this is effecting the
> > Warm Standby.
>
> Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is it
> possible reason of bug?

Hi Teodor,

Hmm, well, the list should be empty by that point anyway. That code is
only executed at the end of xlog replay, not half-way through as we are
seeing.

There are two possibilities:

1. There are some incomplete splits, pointing to a likely bug in GIN
2. There are so many index splits that we aren't able to make a
successful restartpoint using the current mechanism. Not a bug, but
would be an issue with how restartpoints interact with GIN (possibly
other index types also).

When we wrote this I thought (2) would be a problem, but its not shown
up to be so for btrees (yet, I guess). I have some ideas if its (2).

The attached patch should show which of these it is. I'll dress it up a
little better so we have a debug option on this. Please note I've not
tested this patch myself, so Frank if you don't mind me splatting
something at you we'll see what we see.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Вложения

ginwalsplit.v1.patch

Re: warm standby server stops doing checkpointsafterawhile

От

Frank Wittig

Дата:

01 июня 2007 г., 13:51:32

Hi Simon,

> The attached patch should show which of these it is. I'll dress it up a
> little better so we have a debug option on this. Please note I've not
> tested this patch myself, so Frank if you don't mind me splatting
> something at you we'll see what we see.

I'll test that. I have an idea on how to trigger that problem on purpose
  so I dont have to wait until it occures on its own.
Please feel free to let me go through any test you want to run. I really
need that WAL standby - so I will make any effort I can to help getting
it stable.

Greetings,
  Frank Wittig

Вложения

signature.asc

Re: warm standby server stops doing checkpointsafterawhile

От

Frank Wittig

Дата:

01 июня 2007 г., 14:48:45

Hi Simon,

> The attached patch should show which of these it is.


As I thought, I can trigger the issue by letting the database recreate
and reindex some ten thousand data sets.
Here is what happened in the log (I cleaned it to the lines your patch
provides):

<2007-06-01 16:28:51.708 CEST:%> LOG:  GIN incomplete splits=1
<2007-06-01 16:28:51.708 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:29:13.912 CEST:%> LOG:  GIN incomplete splits=1
<2007-06-01 16:29:13.912 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:29:51.594 CEST:%> LOG:  GIN incomplete splits=1
<2007-06-01 16:29:51.594 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:30:19.351 CEST:%> LOG:  GIN incomplete splits=2
<2007-06-01 16:30:19.351 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:30:19.351 CEST:%> LOG:  GIN incomplete split root:4269
l:102123 r:111741 at redo CA/EEAD8B80
<2007-06-01 16:31:41.991 CEST:%> LOG:  GIN incomplete splits=2
<2007-06-01 16:31:41.991 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:31:41.991 CEST:%> LOG:  GIN incomplete split root:4269
l:102123 r:111741 at redo CA/EEAD8B80
<2007-06-01 16:34:09.217 CEST:%> LOG:  GIN incomplete splits=2
<2007-06-01 16:34:09.218 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:34:09.218 CEST:%> LOG:  GIN incomplete split root:4269
l:102123 r:111741 at redo CA/EEAD8B80
<2007-06-01 16:35:31.933 CEST:%> LOG:  GIN incomplete splits=2
<2007-06-01 16:35:31.933 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:35:31.933 CEST:%> LOG:  GIN incomplete split root:4269
l:102123 r:111741 at redo CA/EEAD8B80
<2007-06-01 16:36:56.391 CEST:%> LOG:  GIN incomplete splits=2
<2007-06-01 16:36:56.391 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:36:56.392 CEST:%> LOG:  GIN incomplete split root:4269
l:102123 r:111741 at redo CA/EEAD8B80
<2007-06-01 16:38:23.133 CEST:%> LOG:  GIN incomplete splits=2
<2007-06-01 16:38:23.133 CEST:%> LOG:  GIN incomplete split root:11118
l:45303 r:111740 at redo CA/C8243C28
<2007-06-01 16:38:23.133 CEST:%> LOG:  GIN incomplete split root:4269
l:102123 r:111741 at redo CA/EEAD8B80

It didnt do us the favor to produce more incomplete splits. But these
two are enough to keep the standby server from doing checkpoints.

Greetings,
  Frank Wittig

Вложения

signature.asc

Re: warm standby server stops doing checkpointsafterawhile

От

Teodor Sigaev

Дата:

01 июня 2007 г., 15:11:41

> <2007-06-01 16:28:51.708 CEST:%> LOG:  GIN incomplete split root:11118
> l:45303 r:111740 at redo CA/C8243C28
...
> <2007-06-01 16:38:23.133 CEST:%> LOG:  GIN incomplete split root:11118
> l:45303 r:111740 at redo CA/C8243C28

Looks like a bug in GIN. I'll play with it. Can you provide more details about
your test?


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: warm standby server stops doing checkpointsafterawhile

От

"Dhaval Shah"

Дата:

01 июня 2007 г., 16:22:54

I am following this thread with interest and hence as a request if the
discussion is kept in this thread, is easier for me to follow it.

Just recently I have put the WAL standby in production using 8.2.3 and
would like to know the circumstances where I will need to upgrade.
Hence the interest.

Regards
Dhaval

Re: warm standby server stops doingcheckpointsafterawhile

От

"Simon Riggs"

Дата:

01 июня 2007 г., 16:34:40

On Fri, 2007-06-01 at 19:11 +0400, Teodor Sigaev wrote:
> > <2007-06-01 16:28:51.708 CEST:%> LOG:  GIN incomplete split root:11118
> > l:45303 r:111740 at redo CA/C8243C28
> ...
> > <2007-06-01 16:38:23.133 CEST:%> LOG:  GIN incomplete split root:11118
> > l:45303 r:111740 at redo CA/C8243C28
>
> Looks like a bug in GIN. I'll play with it. Can you provide more details about
> your test?

Agreed, sorry about that.

I note that forgetIncompleteSplit doesn't report anything if it fails to
find an incomplete split when it looks for it.

ISTM that either we have
1) sometimes a starting split, but no ending split
2) a starting split, but we try to remove the wrong split later, for
some reason. i.e. we make the change correctly, just misjudge the xlog
somehow.

(1) would show up on data retrieval tests, but (2) would show up only on
recovery. So I would think it is (2).

I'd suggest we throw an error, as shown in the enclosed patch. Frank,
can you give that a whirl to provide Teodor with something more to work
with? Thanks.

Neither GIST nor B-tree seems to throw an error in corresponding
locations also, so the potential for not being able to track this is
high. I'd want to throw errors in those locations also.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Вложения

ginwalsplit.v2.patch

Re: warm standby server stops doingcheckpointsafterawhile

От

Teodor Sigaev

Дата:

01 июня 2007 г., 16:49:26

> I'd suggest we throw an error, as shown in the enclosed patch. Frank,
> can you give that a whirl to provide Teodor with something more to work
> with? Thanks.

I already makes a test suite which reproduce the problem - it leaves incompleted
splits. But I discover one more problem: deadlock on buffer's lock. So, right
now I investigate the problem.

>
> Neither GIST nor B-tree seems to throw an error in corresponding
> locations also, so the potential for not being able to track this is
> high. I'd want to throw errors in those locations also.

Agreed, I'll add more check
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: warm standby server stops doingcheckpointsafterawhile

От

"Simon Riggs"

Дата:

01 июня 2007 г., 16:51:39

On Fri, 2007-06-01 at 09:22 -0700, Dhaval Shah wrote:
> I am following this thread with interest and hence as a request if the
> discussion is kept in this thread, is easier for me to follow it.
>
> Just recently I have put the WAL standby in production using 8.2.3 and
> would like to know the circumstances where I will need to upgrade.
> Hence the interest.

OK, in summary:

We have a rare bug in WAL recovery for GIN indexes that has been noticed
during Warm Standby recovery. Currently, this is effecting the ability
to perform restartable recovery, which would cause some administrative
difficulty if the *standby* server fails. We're working on a fix.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: warm standby server stops doingcheckpointsafterawhile

От

"Simon Riggs"

Дата:

01 июня 2007 г., 17:08:52

On Fri, 2007-06-01 at 20:49 +0400, Teodor Sigaev wrote:
> > I'd suggest we throw an error, as shown in the enclosed patch. Frank,
> > can you give that a whirl to provide Teodor with something more to work
> > with? Thanks.
>
> I already makes a test suite which reproduce the problem - it leaves incompleted
> splits. But I discover one more problem: deadlock on buffer's lock. So, right
> now I investigate the problem.

OK, I'll leave the code to you from here....

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: warm standby server stops doingcheckpointsafterawhile

От

Teodor Sigaev

Дата:

01 июня 2007 г., 18:42:13

Found a reason: if parent page is fully backuped after child's split then
forgetIncompleteSplit() isn't called at all.

Hope, attached patch fix that. Pls, test it.

PS I'm going away for weekend, so I'll not be online until Monday.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

patch_wal_gin.gz

Re: warm standby server stops doingcheckpointsafterawhile

От

"Simon Riggs"

Дата:

01 июня 2007 г., 18:57:32

On Fri, 2007-06-01 at 22:42 +0400, Teodor Sigaev wrote:
> Found a reason: if parent page is fully backuped after child's split then
> forgetIncompleteSplit() isn't called at all.

i.e. full_page_writes = on

> Hope, attached patch fix that. Pls, test it.
>
> PS I'm going away for weekend, so I'll not be online until Monday.

OK, I'll check the other AMs to see if we have similar problems.

Have a good weekend.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: warm standby server stops doingcheckpointsafterawhile

От

Frank Wittig

Дата:

01 июня 2007 г., 20:17:32

Hi Teodor,

Teodor Sigaev schrieb:
> Hope, attached patch fix that. Pls, test it.

The patch is running. I'll keep on reporting.

Have a nice weekend.

Greetings,
Frank Wittig

Вложения

signature.asc

Re: warm standby server stops doingcheckpointsafterawhile

От

Frank Wittig

Дата:

01 июня 2007 г., 21:15:20

Teodor Sigaev schrieb:
> Hope, attached patch fix that. Pls, test it.

It still happens.
The log is full of incomplete split dumps:

<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete splits=8
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:696
l:93996 r:111778 at redo CF/4B669DF0
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:562
l:4491 r:111780 at redo CF/9FEF39F0
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:200
l:63015 r:111781 at redo CF/B45AC9B0
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:318
l:322 r:111782 at redo CF/B9939B58
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:374
l:1912 r:111783 at redo CF/B9B6CC00
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:1552
l:1555 r:111784 at redo CF/F1C6D770
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:221
l:222 r:111785 at redo D0/ED2F6F0
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:1147
l:111771 r:111786 at redo D0/1DDE64C8
<2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:00:00.814 CEST:%> LOG:  restored log file
"00000001000000D000000029" from archive
<2007-06-01 23:00:03.325 CEST:%> LOG:  restored log file
"00000001000000D00000002A" from archive
<2007-06-01 23:00:06.145 CEST:%> LOG:  restored log file
"00000001000000D00000002B" from archive
<2007-06-01 23:00:10.996 CEST:%> LOG:  restored log file
"00000001000000D00000002C" from archive
<2007-06-01 23:00:14.283 CEST:%> LOG:  restored log file
"00000001000000D00000002D" from archive
<2007-06-01 23:00:17.086 CEST:%> LOG:  restored log file
"00000001000000D00000002E" from archive
<2007-06-01 23:01:19.587 CEST:%> LOG:  restored log file
"00000001000000D00000002F" from archive
<2007-06-01 23:01:22.700 CEST:%> LOG:  restored log file
"00000001000000D000000030" from archive
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete splits=9
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:696
l:93996 r:111778 at redo CF/4B669DF0
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:562
l:4491 r:111780 at redo CF/9FEF39F0
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:200
l:63015 r:111781 at redo CF/B45AC9B0
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:318
l:322 r:111782 at redo CF/B9939B58
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:374
l:1912 r:111783 at redo CF/B9B6CC00
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:1552
l:1555 r:111784 at redo CF/F1C6D770
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:221
l:222 r:111785 at redo D0/ED2F6F0
<2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:1147
l:111771 r:111786 at redo D0/1DDE64C8
<2007-06-01 23:01:24.039 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online
<2007-06-01 23:01:24.039 CEST:%> LOG:  GIN incomplete split root:34
l:1704 r:111787 at redo D0/289FC958
<2007-06-01 23:01:24.039 CEST:%> CONTEXT:  xlog redo checkpoint: redo
D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
offset 15; online

Вложения

signature.asc

Re: warm standby server stops doingcheckpointsafterawhile

От

"Simon Riggs"

Дата:

02 июня 2007 г., 10:13:06

On Fri, 2007-06-01 at 23:14 +0200, Frank Wittig wrote:
> Teodor Sigaev schrieb:
> > Hope, attached patch fix that. Pls, test it.
>
> It still happens.
> The log is full of incomplete split dumps:
>
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete splits=8
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:696
> l:93996 r:111778 at redo CF/4B669DF0
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:562
> l:4491 r:111780 at redo CF/9FEF39F0
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:200
> l:63015 r:111781 at redo CF/B45AC9B0
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:318
> l:322 r:111782 at redo CF/B9939B58
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:374
> l:1912 r:111783 at redo CF/B9B6CC00
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:1552
> l:1555 r:111784 at redo CF/F1C6D770
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:221
> l:222 r:111785 at redo D0/ED2F6F0
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete split root:1147
> l:111771 r:111786 at redo D0/1DDE64C8
> <2007-06-01 23:00:00.001 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:00:00.814 CEST:%> LOG:  restored log file
> "00000001000000D000000029" from archive
> <2007-06-01 23:00:03.325 CEST:%> LOG:  restored log file
> "00000001000000D00000002A" from archive
> <2007-06-01 23:00:06.145 CEST:%> LOG:  restored log file
> "00000001000000D00000002B" from archive
> <2007-06-01 23:00:10.996 CEST:%> LOG:  restored log file
> "00000001000000D00000002C" from archive
> <2007-06-01 23:00:14.283 CEST:%> LOG:  restored log file
> "00000001000000D00000002D" from archive
> <2007-06-01 23:00:17.086 CEST:%> LOG:  restored log file
> "00000001000000D00000002E" from archive
> <2007-06-01 23:01:19.587 CEST:%> LOG:  restored log file
> "00000001000000D00000002F" from archive
> <2007-06-01 23:01:22.700 CEST:%> LOG:  restored log file
> "00000001000000D000000030" from archive
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete splits=9
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:696
> l:93996 r:111778 at redo CF/4B669DF0
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:562
> l:4491 r:111780 at redo CF/9FEF39F0
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:200
> l:63015 r:111781 at redo CF/B45AC9B0
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:318
> l:322 r:111782 at redo CF/B9939B58
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:374
> l:1912 r:111783 at redo CF/B9B6CC00
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:1552
> l:1555 r:111784 at redo CF/F1C6D770
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:221
> l:222 r:111785 at redo D0/ED2F6F0
> <2007-06-01 23:01:24.038 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.038 CEST:%> LOG:  GIN incomplete split root:1147
> l:111771 r:111786 at redo D0/1DDE64C8
> <2007-06-01 23:01:24.039 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online
> <2007-06-01 23:01:24.039 CEST:%> LOG:  GIN incomplete split root:34
> l:1704 r:111787 at redo D0/289FC958
> <2007-06-01 23:01:24.039 CEST:%> CONTEXT:  xlog redo checkpoint: redo
> D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8;
> offset 15; online

OK, here's what I see:

1. After a certain point, consecutive GIN index splits cause a problem.
The new RHS block numbers are consecutive from 111780+

2. The incomplete splits stay around indefinitely after creation and we
aren't trying to remove the wrong split at any point. We're either never
creating an xlog record, or we are ignoring it in recovery, or we are
somehow making multiple entries then not removing all of them.

3. The root seems to move, which isn't what I personally was expecting
to see. It seems root refers to the highest parent involved in the
split.

4. We're writing lots of redo in between failed page splits. So *almost*
everything is working correctly.

5. This starts to happen when we have very large indexes. This may be
coincidental but the first relation file is fairly full (900+ MB).

I'll do a patch to identify whether these are entry or data parts of the
GIN index, and whether we are ignoring records somehow when they hit
gin_redo. Perhaps we can have identified the exact issue by the time
Teodor returns.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: warm standby server stops doingcheckpointsafterawhile

От

Teodor Sigaev

Дата:

04 июня 2007 г., 09:55:23

> <2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete splits=8

Just to be sure: patch fixes *creating* of WAL log, not replaying. So, primary
db should be patched too.

During weekend I found possible deadlock in locking protocol in GIN between
concurrent UPDATE and VACUUM queries with the same GIN index involved. Strange,
but I didn't see it in 8.2 and even now I can't reproduce it. It's easy to
reproduce оnly on HEAD with recently added ReadBufferWithStrategy() call instead
of ReadBuffer(). ReadBufferWithStrategy() call was added to implement
limited-size "ring" of buffers for VACUUM. Nevertheless, it's a possible
scenario in 8.2.

Attached patch fixes that deadlock bug too. And, previous version of my patch
has a mistake which is observable on CREATE INDEX .. USING GIN query.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

patch_wal_gin.v6.gz

Re: warm standby server stops doingcheckpointsafterawhile

От

Teodor Sigaev

Дата:

04 июня 2007 г., 10:06:26

> 1. After a certain point, consecutive GIN index splits cause a problem.
> The new RHS block numbers are consecutive from 111780+
That's newly created page. Splitted page might have any number

>
> 2. The incomplete splits stay around indefinitely after creation and we
> aren't trying to remove the wrong split at any point. We're either never
> creating an xlog record, or we are ignoring it in recovery, or we are
> somehow making multiple entries then not removing all of them.
Agreed


> 3. The root seems to move, which isn't what I personally was expecting
> to see. It seems root refers to the highest parent involved in the
> split.
root in this context means parent of splitted page. Actually, there is a lot of
B-tree in GIN, see http://www.sigaev.ru/gin/GinStructure.pdf
>
> 4. We're writing lots of redo in between failed page splits. So *almost*
> everything is working correctly.
>
> 5. This starts to happen when we have very large indexes. This may be
> coincidental but the first relation file is fairly full (900+ MB).

Yes. It seems to me that conditions of error are very rare and B-tree over
ItemPointers (second level of GIN) has a big capacity, 1000+ items per page. So,
splits occur rather rare.



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: warm standby server stops doingcheckpointsafterawhile

От

Teodor Sigaev

Дата:

04 июня 2007 г., 12:03:59

Ooops. Patch doesn't apply cleanly. New version.


> Attached patch fixes that deadlock bug too. And, previous version of my
> patch has a mistake which is observable on CREATE INDEX .. USING GIN query.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

patch_wal_gin.v7.gz

Re: warm standby server stops doingcheckpointsafterawhile

От

Frank Wittig

Дата:

04 июня 2007 г., 15:24:09

After some observation of massive reindexing of some hundred thousand
data sets it seems to me that the slave doesn't skip checkpoints
anymore. (Apart from those skipped because of the CheckpointTimeout thing)
I'll keep an eye on it and report back any news on the issue.

Thank you for the good work!

Regards,
  Frank Wittig

Teodor Sigaev schrieb:
> Ooops. Patch doesn't apply cleanly. New version.
>
>
>> Attached patch fixes that deadlock bug too. And, previous version of
>> my patch has a mistake which is observable on CREATE INDEX .. USING
>> GIN query.
>

Вложения

signature.asc

Re: warm standby server stops doingcheckpointsafterawhile

От

Teodor Sigaev

Дата:

04 июня 2007 г., 16:02:19

> After some observation of massive reindexing of some hundred thousand
> data sets it seems to me that the slave doesn't skip checkpoints
> anymore. (Apart from those skipped because of the CheckpointTimeout thing)
> I'll keep an eye on it and report back any news on the issue.

Nice, committed. Thank for your report and testing.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: warm standby server stops doing checkpoints after a while

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения