Re: Replication failure, slave requesting old segments

Поиск
Список
Период
Сортировка
От Phil Endecott
Тема Re: Replication failure, slave requesting old segments
Дата
Msg-id 1534100972118@dmwebmail.dmwebmail.chezphil.org
обсуждение исходный текст
Ответ на Re: Replication failure, slave requesting old segments  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Replication failure, slave requesting old segments  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-general
Hi Stephen,

Stephen Frost wrote:
> * Phil Endecott (spam_from_pgsql_lists@chezphil.org) wrote:
>> archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
>>                    scp %p backup:backup/postgresql/archivedir/%f'
>
> This is really not a sufficient or particularly intelligent
> archive_command.  In general, I'd strongly recommend against trying to
> roll your own archiving/backup solution for PostgreSQL.  In particular,
> the above is quite expensive, doesn't address the case where a file is
> only partially copied to the backup server, and doesn't ensure that the
> file is completely written and fsync'd on the backup server meaning that
> a failure will likely result in WAL going missing.  There's much better
> solutions out there and which you really should be using instead of
> trying to build your own.  In particular, my preference is pgbackrest
> (though I'm one of the folks in its development, to be clear), but
> there's other options such as barman or WAL-E which I believe also
> address the concerns raised above.

Hmmm, well I based it on what I read in the documentation:

   https://www.postgresql.org/docs/9.6/static/continuous-archiving.html

The expense is not an issue in this case.  I would be surprised if
scp were to exit(0) after only partially copying a file but perhaps
you know something I don't about that.

I know that rsync creates a temporary file and moves it into place when
all the data has been transferred, which scp doesn't do; perhaps that
would be safer.

The master was only writing WAL files maybe 3 or 4 times per day at
the time, so any scenario that requires its crash to have occurred
exactly as it was doing an scp involves a substantial coincidence.

I am limited in what I can run on the backup server.


> Having it go back to D0 is at least a bit interesting.  I wonder if
> somehow it was partial..  Have you looked at the files on the archive
> server to make sure they're all 16MB and the same size?

Yes, all the files on the backup including D0 were 16 MB.  But:

>> 2018-08-11 00:12:15.536 UTC [7954] LOG:  restored log file "0000000100000007000000D0" from archive
>> 2018-08-11 00:12:15.797 UTC [7954] LOG:  redo starts at 7/D0F956C0
>> 2018-08-11 00:12:16.068 UTC [7954] LOG:  consistent recovery state reached at 7/D0FFF088

Are the last two log lines above telling us anything useful?  Is that
saying that, of the 16 MB (0x1000000 byte) WAL file it restored only as
far as byte 0xf956c0 or 0xfff088?  Is that what we would expect?  Is it
trying to use the streaming connection to get the missing bytes from
FFFF088 to FFFFFFFF?  Is that just an empty gap at the end of the file
due to the next record being too big to fit?

>> db=> create table waste_space as select * from large_table;
>> 
>> That caused four new WAL segments to be copied to the backup, and restored 
>> by the slave:

Note again that this was sufficient to make it start working again,
which I think is important.

> Really hard to tell from this.  On the one hand, it seems like maybe the
> primary was done with D0, but hadn't finished with D1 yet, but the
> replica hadn't made that transistion from D0 -> D1, hence why it was
> still asking for D0.

Yes.  What is actually supposed to happen?  What exactly do you mean by
"make the transition"?

> I'm certainly concerned that there might have
> been some amount of partial-WAL-file copying happening, but PG is pretty
> good at dealing with a variety of odd cases and so I tend to doubt that
> the replica actually ended up being corrupted, but I'll again strongly
> advocate for using a proven solution which avoids these risks by doing
> things like copying into temporary files on the backup server and then
> using an atomic 'mv' to make sure that anyone reading from the archive
> wouldn't ever see a partially-written file, and, just to double-check,
> also calculate a hash of every WAL file written into the archive and
> re-verify that hash when reading it back out.  That's what we do in
> pgbackrest, at least.

Noted.  But I don't see any evidence that that was actually the root
cause here, and it relies on an unlikely timing coincidence.  This is
the very first time that I've had a replica recover after a significant
downtime, and it failed.


Regards, Phil.






В списке pgsql-general по дате отправления:

Предыдущее
От: Dmitry Igrishin
Дата:
Сообщение: Re: PostgreSQL C Language Extension with C++ Code
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PostgreSQL C Language Extension with C++ Code