Обсуждение: postgres replication without pg_basebackup? postgres 13.3
[2022-10-28 21:27:12 CEST-]LOG: starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
[2022-10-28 21:27:12 CEST-]LOG: listening on IPv4 address "0.0.0.0", port 8300
[2022-10-28 21:27:12 CEST-]LOG: could not create IPv6 socket for address "::": Address family not supported by protocol
[2022-10-28 21:27:12 CEST-]LOG: listening on Unix socket "/tmp/.s.PGSQL.8300"
[2022-10-28 21:27:12 CEST-]LOG: database system was interrupted; last known up at 2022-10-28 20:31:08 CEST
[2022-10-28 21:28:06 CEST-]LOG: restored log file "00000003.history" from archive
cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file or directory
[2022-10-28 21:28:06 CEST-]LOG: entering standby mode
[2022-10-28 21:28:06 CEST-]LOG: restored log file "00000003.history" from archive
cp: cannot stat '/PostgresWalLogArchive/new/000000030000BF720000002F': No such file or directory
[2022-10-28 21:28:06 CEST-]LOG: restored log file "000000020000BF720000002F" from archive
[2022-10-28 21:28:06 CEST-]FATAL: requested timeline 3 is not a child of this server's history
[2022-10-28 21:28:06 CEST-]DETAIL: Latest checkpoint is at BF72/2F0309C0 on timeline 2, but in the history of the requested timeline, the server forked off from that timeline at B1D6/60000000.
[2022-10-28 21:28:06 CEST-]LOG: startup process (PID 1298266) exited with exit code 1
[2022-10-28 21:28:06 CEST-]LOG: aborting startup due to startup process failure
[2022-10-28 21:28:07 CEST-]LOG: database system is shut down
Do you know if we can make it work without having to run pg_basebackup for 37TB which will take about 3 days to copy? Why it is requesting a different timeline if it is a copy of the primary?
Primary replica parameters:
wal_level = replica
archive_mode = on
archive_command = 'cp -i %p /PostgresWalLogArchive/new/%f'
standby replica parameters:
primary_conninfo = 'host=XXX port=8300 user=postgres password=***'
restore_command = 'cp /PostgresWalLogArchive/new/%f %p'
archive_cleanup_command = '/home/postgres/software/postgresql/bin/pg_archivecleanup /PostgresWalLogArchive/new/%f %r'
Thank you very much for your help in advance.
Pilar de Teodoro
On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote: > We have a very large database of 37TB and we had to promote our standby to primary due to > some disk failures. Now the issues are solved, we would like to make standby the old primary > from a copy of the new primary which is already in place. Is it possible without using pg_basebackup? That's exactly what "pg_rewind" is for. It is a fast version of "pg_basebackup" for exactly that case. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
[postgres@gacsdb05 data-13.3]$ pg_rewind -c -R --target-pgdata=/PostgresDB/sas_hdd/data-13.3/ --source-server="host=XXXX port=XXXX user=postgres password=XXX"
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required
but then we got the following error:
[2022-11-07 22:57:55 CET-]LOG: starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
[2022-11-07 22:57:55 CET-]LOG: listening on IPv4 address "0.0.0.0", port XXXX
[2022-11-07 22:57:55 CET-]LOG: could not create IPv6 socket for address "::": Address family not supported by protocol
[2022-11-07 22:57:55 CET-]LOG: listening on Unix socket "/tmp/.s.PGSQL.XXXX"
[2022-11-07 22:57:55 CET-]LOG: database system was shut down in recovery at 2022-11-07 22:57:47 CET
[2022-11-07 22:57:55 CET-]LOG: restored log file "00000003.history" from archive
cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file or directory
[2022-11-07 22:57:55 CET-]LOG: entering standby mode
[2022-11-07 22:57:55 CET-]LOG: restored log file "00000003.history" from archive
[2022-11-07 22:57:55 CET-]LOG: invalid primary checkpoint record
[2022-11-07 22:57:55 CET-]PANIC: could not locate a valid checkpoint record
[2022-11-07 22:57:55 CET-]LOG: startup process (PID 3011860) was terminated by signal 6: Aborted
[2022-11-07 22:57:55 CET-]LOG: aborting startup due to startup process failure
[2022-11-07 22:57:55 CET-]LOG: database system is shut down
We have read we can run pg_resetwal but the Wal folder is the folder where the Wals are archived in the primary. Would that be correct to reset them?
Any suggestion?
Thank you so much,
Pilar
On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote:
> We have a very large database of 37TB and we had to promote our standby to primary due to
> some disk failures. Now the issues are solved, we would like to make standby the old primary
> from a copy of the new primary which is already in place. Is it possible without using pg_basebackup?
That's exactly what "pg_rewind" is for. It is a fast version of "pg_basebackup" for exactly
that case.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
On Mon, 2022-11-07 at 23:11 +0100, Pilar de Teodoro wrote: > Thank you very much for the idea. [of running pg_rewind] > We ran pg_rewind correctly: > [postgres@gacsdb05 data-13.3]$ pg_rewind -c -R --target-pgdata=/PostgresDB/sas_hdd/data-13.3/ --source-server="host=XXXXport=XXXX user=postgres password=XXX" > pg_rewind: source and target cluster are on the same timeline > pg_rewind: no rewind required I cannot verify that you ran it correctly. "Target" should be the old server with the extra transactions. "Source" should be the promoted standby server. Promotion switches to a new timeline, so it looks like you did something wrong. > but then we got the following error: > > [2022-11-07 22:57:55 CET-]LOG: starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (RedHat 8.5.0-10), 64-bit > [2022-11-07 22:57:55 CET-]LOG: listening on IPv4 address "0.0.0.0", port XXXX > [2022-11-07 22:57:55 CET-]LOG: could not create IPv6 socket for address "::": Address family not supported by protocol > [2022-11-07 22:57:55 CET-]LOG: listening on Unix socket "/tmp/.s.PGSQL.XXXX" > [2022-11-07 22:57:55 CET-]LOG: database system was shut down in recovery at 2022-11-07 22:57:47 CET > [2022-11-07 22:57:55 CET-]LOG: restored log file "00000003.history" from archive > cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file or directory > [2022-11-07 22:57:55 CET-]LOG: entering standby mode > [2022-11-07 22:57:55 CET-]LOG: restored log file "00000003.history" from archive > [2022-11-07 22:57:55 CET-]LOG: invalid primary checkpoint record > [2022-11-07 22:57:55 CET-]PANIC: could not locate a valid checkpoint record > [2022-11-07 22:57:55 CET-]LOG: startup process (PID 3011860) was terminated by signal 6: Aborted > [2022-11-07 22:57:55 CET-]LOG: aborting startup due to startup process failure > [2022-11-07 22:57:55 CET-]LOG: database system is shut down > > We have read we can run pg_resetwal but the Wal folder is the folder where the Wals are archived in the primary. Wouldthat be correct to reset them? Don't run "pg_resetwal". At the very least, it will break your standby. At this point, your standby seems to be broken. I don't know what exactly you did, but it leeks like you should run a "pg_basebackup" after all. Yours, Laurenz Albe