Re: initial sync of multiple streaming slaves simultaneously

Поиск

Список

Период

Сортировка

От	Mike Roest
Тема	Re: initial sync of multiple streaming slaves simultaneously
Дата	19 сентября 2012 г. 22:27:10
Msg-id	CAE7ByhhD+h2SRf9mgSqdHoWW1dgyy586QMbHogxT-2wdzf9VcA@mail.gmail.com обсуждение исходный текст
Ответ на	initial sync of multiple streaming slaves simultaneously (Mike Roest <mike.roest@replicon.com>)
Ответы	Re: initial sync of multiple streaming slaves simultaneously (Lonni J Friedman <netllama@gmail.com>)
Список	pgsql-general

Дерево обсуждения

Is there any hidden issue with this that we haven't seen. Or does anyone have suggestions as to an alternate procedure that will allow 2 slaves to sync concurrently.

With some more testing I've done today I seem to have found an issue with this procedure.

When the slave starts up after the sync It reaches what it thinks is a consistent recovery point very fast based on the pg_stop_backup

eg:

(from the recover script)

2012-09-19 12:15:02: pgsql_start start

2012-09-19 12:15:31: pg_start_backup

2012-09-19 12:15:31: -----------------

2012-09-19 12:15:31: 61/30000020

2012-09-19 12:15:31: (1 row)

2012-09-19 12:15:31:

2012-09-19 12:15:32: NOTICE: pg_stop_backup complete, all required WAL segments have been archived

2012-09-19 12:15:32: pg_stop_backup

2012-09-19 12:15:32: ----------------

2012-09-19 12:15:32: 61/300000D8

2012-09-19 12:15:32: (1 row)

2012-09-19 12:15:32:

While the sync was running (but after the pg_stop_backup) I pushed a bunch of traffic against the master server. Which got me to a current xlog location of

postgres=# select pg_current_xlog_location();

pg_current_xlog_location

--------------------------

61/6834C450

(1 row)

The startup of the slave after the sync completed:

2012-09-19 12:42:49.976 MDT [18791]: [1-1] LOG: database system was interrupted; last known up at 2012-09-19 12:15:31 MDT

2012-09-19 12:42:49.976 MDT [18791]: [2-1] LOG: creating missing WAL directory "pg_xlog/archive_status"

2012-09-19 12:42:50.143 MDT [18791]: [3-1] LOG: entering standby mode

2012-09-19 12:42:50.173 MDT [18792]: [1-1] LOG: streaming replication successfully connected to primary

2012-09-19 12:42:50.487 MDT [18791]: [4-1] LOG: redo starts at 61/30000020

2012-09-19 12:42:50.495 MDT [18791]: [5-1] LOG: consistent recovery state reached at 61/31000000

2012-09-19 12:42:50.495 MDT [18767]: [2-1] LOG: database system is ready to accept read only connections

It shows the DB reached a consistent state as of 61/31000000 which is well behind the current location of the master (and the data files that were synced over to the slave). And monitoring the server showed the expected slave delay that disappeared as the slave pulled and recovered from the WAL files that go generated after the pg_stop_backup.

But based on this it looks like this procedure would end up with a indeterminate amount of time (based on how much traffic the master processed while the slave was syncing) that the slave couldn't be trusted for fail over or querying as the server is up and running but is not actually in a consistent state.

Thinking it through the more complicated script version of the 2 server recovery (where first past the post to run start_backup or stop_backup) would also have this issue (although our failover slave would always be the one running stop backup as it syncs faster so at least it would be always consistent but the DR would still have the problem)

В списке pgsql-general по дате отправления:

Предыдущее

От: "David Johnston"
Дата: 19 сентября 2012 г., 20:17:47
Сообщение: Re: Difference between ON and WHERE in JOINs

Следующее

От: Lonni J Friedman
Дата: 19 сентября 2012 г., 22:34:25
Сообщение: Re: initial sync of multiple streaming slaves simultaneously

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: initial sync of multiple streaming slaves simultaneously

Предыдущее

Следующее