Re: Synchronous commit behavior during network outage

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: Synchronous commit behavior during network outage
Дата
Msg-id 7E0C0453-82FB-40EC-92FF-FDB780D1AD48@yandex-team.ru
обсуждение исходный текст
Ответ на Re: Synchronous commit behavior during network outage  (Ondřej Žižka <ondrej.zizka@stratox.cz>)
Ответы Re: Synchronous commit behavior during network outage  (Ondřej Žižka <ondrej.zizka@stratox.cz>)
Список pgsql-hackers
Thanks for reviewing Ondřej!

> 26 апр. 2021 г., в 22:01, Ondřej Žižka <ondrej.zizka@stratox.cz> написал(а):
>
> Hello Andrey,
>
> I went through the thread for your patch and seems to me as an acceptable solution...
>
> > The only case patch does not handle is sudden backend crash - Postgres will recover without a restart.
>
> We also use a HA tool (Patroni). If the whole machine fails, it will find a new master and it should be OK. We use a
4node setup (2 sync replicas and 1 async from every replica). If there is an issue just with sync replica (async
operatednormally) and the master fails completely in this situation, it will be solved by Patroni (the async replica
becomeanother sync), but if it is just the backend process, the master will not failover and changes will be still
visible...
>
> If the sync replica outage is temporal it will be solved itself when the node will establish a replication slot
again...If the outage is "long", Patroni will remove the "old" sync replica from the cluster and the async replica
readingfrom the master would be new sync. So yes... In 2 node setup, this can be an issue, but in 4 node setup, this
seemsto me like a solution. 
> The only situation I can imagine is a situation when the client connections use a different network than the
replicationnetwork and the replication network would be down completely, but the client network will be up. In that
case,the master can be an "isolated island" and if it fails, we can lose the changed data. 
It is, in fact, very common type of network partition.

> Is this situation also covered in your model: "transaction effects should not be observable on primary until
requirementsof synchronous_commit are satisfied." 
Yes. If synchronous_commit_cancelation = off, no backend crash occurs and HA tool does not start PostgreSQL service
whenin doubt that other primary may exists. 

> Do you agree with my thoughts?
I could not understand your reasoning about 2 and 4 nodes. Can you please clarify a bit how 4 node setup can help
preventvisibility of commited-locall-but-canceled transactions? 

I do not think we can classify network partitions as "temporal" and "long". Due to the distributed nature of the system
networkpartitions are eternal and momentary. Simultaneously. And if the node A can access node B and node C, this
neitherimplies B can access C, nor B can access A. 

> Maybe would be possible to implement it into PostgreSQL with a note in documentation, that a multinode (>=3 nodes)
clusteris necessary. 
PostgreSQL does not provide and fault detection and automatic failover. Documenting anything wrt failover is the
responsibilityof HA tool. 

Thanks!

Best regards, Andrey Borodin.





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Why do we have perl and sed versions of Gen_dummy_probes?
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Small issues with CREATE TABLE COMPRESSION