Re: Commit to primary with unavailable sync standby

Поиск
Список
Период
Сортировка
От Maksim Milyutin
Тема Re: Commit to primary with unavailable sync standby
Дата
Msg-id 2957464c-f5a2-5e8b-41af-a8809ebb9cd3@gmail.com
обсуждение исходный текст
Ответ на Commit to primary with unavailable sync standby  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: Commit to primary with unavailable sync standby  (Fabio Ugo Venchiarutti <f.venchiarutti@ocado.com>)
Список pgsql-general
On 19.12.2019 14:04, Andrey Borodin wrote:

> Hi!


Hi!

FYI, this topic was up recently in -hackers 
https://www.postgresql.org/message-id/CAEET0ZHG5oFF7iEcbY6TZadh1mosLmfz1HLm311P9VOt7Z+jeg@mail.gmail.com


> I cannot figure out proper way to implement safe HA upsert. I will be very grateful if someone would help me.
>
> Imagine we have primary server after failover. It is network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING;
thateventually timed out.
 
>
> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>      INSERT INTO t(
>          pk,
>          v,
>          dt
>      )
>      VALUES
>      (
>          5,
>          'text',
>          now()
>      )
>      ON CONFLICT (pk) DO NOTHING
>      RETURNING pk,
>                v,
>                dt)
>     SELECT new_doc.pk from new_doc;
> ^CCancel request sent
> WARNING:  01000: canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> LOCATION:  SyncRepWaitForLSN, syncrep.c:264
> Time: 2173.770 ms (00:02.174)
>
> Here our driver decided that something goes wrong and we retry query.
>
> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>      INSERT INTO t(
>          pk,
>          v,
>          dt
>      )
>      VALUES
>      (
>          5,
>          'text',
>          now()
>      )
>      ON CONFLICT (pk) DO NOTHING
>      RETURNING pk,
>                v,
>                dt)
>     SELECT new_doc.pk from new_doc;
>   pk
> ----
> (0 rows)
>
> Time: 4.785 ms
>
> Now we have split-brain, because we acknowledged that row to client.
> How can I fix this?
>
> There must be some obvious trick, but I cannot see it... Or maybe cancel of sync replication should be disallowed and
terminationshould be treated as system failure?
 
>

I think the most appropriate way to handle such issues is to catch by 
client driver such warnings (with message about local commit) and mark 
the status of posted transaction as undetermined. If connection with 
sync replica will come back then this transaction eventually commits but 
after triggering of autofailover and *not replicating this commit to 
replica* this commit aborts. Therefore client have to wait some time 
(that exceeds the duration of autofailover) and check (logically based 
on committed data) the status of commit.

The problem here is the locally committed data becomes visible to future 
transactions (before autofailover) that violates the property of 
consistent reading from master. IMO the more correct behavior for 
PostgreSQL here is to ignore any cancel / termination queries when 
backend is in status of waiting response from sync replicas.

However, there is another way to get locally applied commits via restart 
of master after initial recovery. This case is described in doc 
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA 
. But here HA orchestrator agent can close access from external users 
(via pg_hba.conf manipulations) until PostgreSQL instance synchronizes 
its changes with all sync replicas as it's implemented in Stolon 

https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

.


Best regards,
Maksim Milyutin




В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Partitioned tables and locks
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: Max locks