Обсуждение: postgres replication without pg_basebackup? postgres 13.3

Поиск
Список
Период
Сортировка

postgres replication without pg_basebackup? postgres 13.3

От
Pilar de Teodoro
Дата:
Dear all,

We have a very large database of 37TB and we had to promote our standby to primary due to some disk failures. Now the issues are solved, we would like to make standby  the old primary  from a copy of the new primary which is already in place.  Is it possible without using pg_basebackup? We have already a copy created than can be started but if we open as standby (standby.signal created) the log says when started:

[2022-10-28 21:27:12 CEST-]LOG:  starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit

[2022-10-28 21:27:12 CEST-]LOG:  listening on IPv4 address "0.0.0.0", port 8300

[2022-10-28 21:27:12 CEST-]LOG:  could not create IPv6 socket for address "::": Address family not supported by protocol

[2022-10-28 21:27:12 CEST-]LOG:  listening on Unix socket "/tmp/.s.PGSQL.8300"

[2022-10-28 21:27:12 CEST-]LOG:  database system was interrupted; last known up at 2022-10-28 20:31:08 CEST

[2022-10-28 21:28:06 CEST-]LOG:  restored log file "00000003.history" from archive

cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file or directory

[2022-10-28 21:28:06 CEST-]LOG:  entering standby mode

[2022-10-28 21:28:06 CEST-]LOG:  restored log file "00000003.history" from archive

cp: cannot stat '/PostgresWalLogArchive/new/000000030000BF720000002F': No such file or directory

[2022-10-28 21:28:06 CEST-]LOG:  restored log file "000000020000BF720000002F" from archive

[2022-10-28 21:28:06 CEST-]FATAL:  requested timeline 3 is not a child of this server's history

[2022-10-28 21:28:06 CEST-]DETAIL:  Latest checkpoint is at BF72/2F0309C0 on timeline 2, but in the history of the requested timeline, the server forked off from that timeline at B1D6/60000000.

[2022-10-28 21:28:06 CEST-]LOG:  startup process (PID 1298266) exited with exit code 1

[2022-10-28 21:28:06 CEST-]LOG:  aborting startup due to startup process failure

[2022-10-28 21:28:07 CEST-]LOG:  database system is shut down





Do you know if we can make it work without having to run pg_basebackup for 37TB which will take about 3 days to copy? Why it is requesting a different timeline if it is a copy of the primary?


Primary replica parameters:



wal_level = replica

archive_mode = on

archive_command = 'cp -i %p /PostgresWalLogArchive/new/%f'



standby replica parameters:


primary_conninfo = 'host=XXX port=8300 user=postgres password=***'

restore_command = 'cp /PostgresWalLogArchive/new/%f %p'

archive_cleanup_command = '/home/postgres/software/postgresql/bin/pg_archivecleanup /PostgresWalLogArchive/new/%f %r'



Thank you very much for your help in advance.


Pilar de Teodoro






Re: postgres replication without pg_basebackup? postgres 13.3

От
Laurenz Albe
Дата:
On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote:
> We have a very large database of 37TB and we had to promote our standby to primary due to
> some disk failures. Now the issues are solved, we would like to make standby  the old primary
> from a copy of the new primary which is already in place.  Is it possible without using pg_basebackup?

That's exactly what "pg_rewind" is for.  It is a fast version of "pg_basebackup" for exactly
that case.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: postgres replication without pg_basebackup? postgres 13.3

От
Pilar de Teodoro
Дата:
Dear Laurenz,
Thank you very much for the idea. 
We ran pg_rewind correctly:

[postgres@gacsdb05 data-13.3]$ pg_rewind -c -R --target-pgdata=/PostgresDB/sas_hdd/data-13.3/  --source-server="host=XXXX port=XXXX user=postgres password=XXX" 

pg_rewind: source and target cluster are on the same timeline

pg_rewind: no rewind required


but then we got the following error:


[2022-11-07 22:57:55 CET-]LOG:  starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit

[2022-11-07 22:57:55 CET-]LOG:  listening on IPv4 address "0.0.0.0", port XXXX

[2022-11-07 22:57:55 CET-]LOG:  could not create IPv6 socket for address "::": Address family not supported by protocol

[2022-11-07 22:57:55 CET-]LOG:  listening on Unix socket "/tmp/.s.PGSQL.XXXX"

[2022-11-07 22:57:55 CET-]LOG:  database system was shut down in recovery at 2022-11-07 22:57:47 CET

[2022-11-07 22:57:55 CET-]LOG:  restored log file "00000003.history" from archive

cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file or directory

[2022-11-07 22:57:55 CET-]LOG:  entering standby mode

[2022-11-07 22:57:55 CET-]LOG:  restored log file "00000003.history" from archive

[2022-11-07 22:57:55 CET-]LOG:  invalid primary checkpoint record

[2022-11-07 22:57:55 CET-]PANIC:  could not locate a valid checkpoint record

[2022-11-07 22:57:55 CET-]LOG:  startup process (PID 3011860) was terminated by signal 6: Aborted

[2022-11-07 22:57:55 CET-]LOG:  aborting startup due to startup process failure

[2022-11-07 22:57:55 CET-]LOG:  database system is shut down


We have read we can run pg_resetwal but the Wal folder is the folder where the Wals are archived in the primary. Would that be correct to reset them?


Any suggestion?

Thank you so much,


Pilar


El lun, 7 nov 2022 a las 12:21, Laurenz Albe (<laurenz.albe@cybertec.at>) escribió:
On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote:
> We have a very large database of 37TB and we had to promote our standby to primary due to
> some disk failures. Now the issues are solved, we would like to make standby  the old primary
> from a copy of the new primary which is already in place.  Is it possible without using pg_basebackup?

That's exactly what "pg_rewind" is for.  It is a fast version of "pg_basebackup" for exactly
that case.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: postgres replication without pg_basebackup? postgres 13.3

От
Laurenz Albe
Дата:
On Mon, 2022-11-07 at 23:11 +0100, Pilar de Teodoro wrote:
> Thank you very much for the idea. [of running pg_rewind]
> We ran pg_rewind correctly:
> [postgres@gacsdb05 data-13.3]$ pg_rewind -c -R --target-pgdata=/PostgresDB/sas_hdd/data-13.3/ 
--source-server="host=XXXXport=XXXX user=postgres password=XXX" 
 
> pg_rewind: source and target cluster are on the same timeline
> pg_rewind: no rewind required

I cannot verify that you ran it correctly.

"Target" should be the old server with the extra transactions.
"Source" should be the promoted standby server.  Promotion switches to a new timeline,
so it looks like you did something wrong.

> but then we got the following error:
> 
> [2022-11-07 22:57:55 CET-]LOG:  starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514
(RedHat 8.5.0-10), 64-bit
 
> [2022-11-07 22:57:55 CET-]LOG:  listening on IPv4 address "0.0.0.0", port XXXX
> [2022-11-07 22:57:55 CET-]LOG:  could not create IPv6 socket for address "::": Address family not supported by
protocol
> [2022-11-07 22:57:55 CET-]LOG:  listening on Unix socket "/tmp/.s.PGSQL.XXXX"
> [2022-11-07 22:57:55 CET-]LOG:  database system was shut down in recovery at 2022-11-07 22:57:47 CET
> [2022-11-07 22:57:55 CET-]LOG:  restored log file "00000003.history" from archive
> cp: cannot stat '/PostgresWalLogArchive/new/00000004.history': No such file or directory
> [2022-11-07 22:57:55 CET-]LOG:  entering standby mode
> [2022-11-07 22:57:55 CET-]LOG:  restored log file "00000003.history" from archive
> [2022-11-07 22:57:55 CET-]LOG:  invalid primary checkpoint record
> [2022-11-07 22:57:55 CET-]PANIC:  could not locate a valid checkpoint record
> [2022-11-07 22:57:55 CET-]LOG:  startup process (PID 3011860) was terminated by signal 6: Aborted
> [2022-11-07 22:57:55 CET-]LOG:  aborting startup due to startup process failure
> [2022-11-07 22:57:55 CET-]LOG:  database system is shut down
> 
> We have read we can run pg_resetwal but the Wal folder is the folder where the Wals are archived in the primary.
Wouldthat be correct to reset them?
 

Don't run "pg_resetwal".  At the very least, it will break your standby.

At this point, your standby seems to be broken.  I don't know what exactly you
did, but it leeks like you should run a "pg_basebackup" after all.

Yours,
Laurenz Albe