Обсуждение: Commit to primary with unavailable sync standby

Поиск
Список
Период
Сортировка

Commit to primary with unavailable sync standby

От
Andrey Borodin
Дата:
Hi!

I cannot figure out proper way to implement safe HA upsert. I will be very grateful if someone would help me.

Imagine we have primary server after failover. It is network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING;
thateventually timed out. 

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
    INSERT INTO t(
        pk,
        v,
        dt
    )
    VALUES
    (
        5,
        'text',
        now()
    )
    ON CONFLICT (pk) DO NOTHING
    RETURNING pk,
              v,
              dt)
   SELECT new_doc.pk from new_doc;
^CCancel request sent
WARNING:  01000: canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
LOCATION:  SyncRepWaitForLSN, syncrep.c:264
Time: 2173.770 ms (00:02.174)

Here our driver decided that something goes wrong and we retry query.

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
    INSERT INTO t(
        pk,
        v,
        dt
    )
    VALUES
    (
        5,
        'text',
        now()
    )
    ON CONFLICT (pk) DO NOTHING
    RETURNING pk,
              v,
              dt)
   SELECT new_doc.pk from new_doc;
 pk
----
(0 rows)

Time: 4.785 ms

Now we have split-brain, because we acknowledged that row to client.
How can I fix this?

There must be some obvious trick, but I cannot see it... Or maybe cancel of sync replication should be disallowed and
terminationshould be treated as system failure? 

Best regards, Andrey Borodin.


Re: Commit to primary with unavailable sync standby

От
Fabio Ugo Venchiarutti
Дата:



On 19/12/2019 11:04, Andrey Borodin wrote:
> Hi!
> 
> I cannot figure out proper way to implement safe HA upsert. I will be very grateful if someone would help me.
> 
> Imagine we have primary server after failover. It is network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING;
thateventually timed out.
 
> 
> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>      INSERT INTO t(
>          pk,
>          v,
>          dt
>      )
>      VALUES
>      (
>          5,
>          'text',
>          now()
>      )
>      ON CONFLICT (pk) DO NOTHING
>      RETURNING pk,
>                v,
>                dt)
>     SELECT new_doc.pk from new_doc;
> ^CCancel request sent
> WARNING:  01000: canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> LOCATION:  SyncRepWaitForLSN, syncrep.c:264
> Time: 2173.770 ms (00:02.174)
> 
> Here our driver decided that something goes wrong and we retry query.
> 
> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>      INSERT INTO t(
>          pk,
>          v,
>          dt
>      )
>      VALUES
>      (
>          5,
>          'text',
>          now()
>      )
>      ON CONFLICT (pk) DO NOTHING
>      RETURNING pk,
>                v,
>                dt)
>     SELECT new_doc.pk from new_doc;
>   pk
> ----
> (0 rows)
> 
> Time: 4.785 ms
> 
> Now we have split-brain, because we acknowledged that row to client.
> How can I fix this?
> 
> There must be some obvious trick, but I cannot see it... Or maybe cancel of sync replication should be disallowed and
terminationshould be treated as system failure?
 
> 
> Best regards, Andrey Borodin.
> 

You're hitting the CAP theorem ( https://en.wikipedia.org/wiki/CAP_theorem )


You cannot do it with fewer than 3 nodes, as the moment you set your 
standby to synchronous to achieve consistency, both your nodes become 
single points of failure.


With 3 or more nodes you can perform what is called a quorum write 
against ( floor(<total_nodes> / 2) + 1 ) nodes .


With 3+ nodes, the "easy" strategy is to set a <quorum - 1> number of 
standby nodes in synchronous_standby_names ( 
https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES 
)


This however makes it tricky to pick the correct standby for promotions 
during auto-failovers, as you need to freeze all the standbys listed in 
the above setting in order to correctly determine which one has the 
highest WAL location without running into race conditions (as the 
operation is non-atomic, stateful and sticky).


I personally prefer to designate a fixed synchronous set at setup time 
and automatically set a static synchronous_standby_names on the master 
whenever a failover occurs. That allows for a simpler failover mechanism 
as you know they got the latest WAL location.



If you want an off-the shelf solution, nowadays Patroni seems to be all 
the rage.




-- 
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

-- 


Notice: 
This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group.

If you are not the intended recipient, please notify us 
immediately and delete all copies of this message. Please note that it is 
your responsibility to scan this message for viruses.

References to the 
"Ocado Group" are to Ocado Group plc (registered in England and Wales with 
number 7098618) and its subsidiary undertakings (as that expression is 
defined in the Companies Act 2006) from time to time. The registered office 
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way, 
Hatfield, Hertfordshire, AL10 9UL.



Re: Commit to primary with unavailable sync standby

От
Andrey Borodin
Дата:
Hi Fabio!

Thanks for looking into this.

> 19 дек. 2019 г., в 17:14, Fabio Ugo Venchiarutti <f.venchiarutti@ocado.com> написал(а):
>
>
> You're hitting the CAP theorem ( https://en.wikipedia.org/wiki/CAP_theorem )
>
>
> You cannot do it with fewer than 3 nodes, as the moment you set your standby to synchronous to achieve consistency,
bothyour nodes become single points of failure. 
We have 3 nodes, and the problem is reproducible with all standbys being synchronous.

> With 3 or more nodes you can perform what is called a quorum write against ( floor(<total_nodes> / 2) + 1 ) nodes .
The problem seems to be reproducible in quorum commit too.

> With 3+ nodes, the "easy" strategy is to set a <quorum - 1> number of standby nodes in synchronous_standby_names (
https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES) 
>
>
> This however makes it tricky to pick the correct standby for promotions during auto-failovers, as you need to freeze
allthe standbys listed in the above setting in order to correctly determine which one has the highest WAL location
withoutrunning into race conditions (as the operation is non-atomic, stateful and sticky). 
After promotion of any standby we still can commit to old primary with the combination of cancel and retry.

> I personally prefer to designate a fixed synchronous set at setup time and automatically set a static
synchronous_standby_nameson the master whenever a failover occurs. That allows for a simpler failover mechanism as you
knowthey got the latest WAL location. 
No, synchronous standby does not necessarily own latest WAL. It has WAL point no earlier than all commits acknowledged
toclient. 

Thanks!

Best regards, Andrey Borodin.


Re: Commit to primary with unavailable sync standby

От
Fabio Ugo Venchiarutti
Дата:

On 19/12/2019 12:25, Andrey Borodin wrote:
> Hi Fabio!
>
> Thanks for looking into this.
>
>> 19 дек. 2019 г., в 17:14, Fabio Ugo Venchiarutti <f.venchiarutti@ocado.com> написал(а):
>>
>>
>> You're hitting the CAP theorem ( https://en.wikipedia.org/wiki/CAP_theorem )
>>
>>
>> You cannot do it with fewer than 3 nodes, as the moment you set your standby to synchronous to achieve consistency,
bothyour nodes become single points of failure. 
> We have 3 nodes, and the problem is reproducible with all standbys being synchronous.


>
>> With 3 or more nodes you can perform what is called a quorum write against ( floor(<total_nodes> / 2) + 1 ) nodes .
> The problem seems to be reproducible in quorum commit too.

>
>> With 3+ nodes, the "easy" strategy is to set a <quorum - 1> number of standby nodes in synchronous_standby_names (
https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES) 
>>
>>
>> This however makes it tricky to pick the correct standby for promotions during auto-failovers, as you need to freeze
allthe standbys listed in the above setting in order to correctly determine which one has the highest WAL location
withoutrunning into race conditions (as the operation is non-atomic, stateful and sticky). 
> After promotion of any standby we still can commit to old primary with the combination of cancel and retry.


AFAICT this pseudo-idempotency issue can only be solved if every query
is validated against the quorum.

A quick-and-dirty solution would be to wrap the whole thing in a CTE
which also returns a count from pg_stat_replication (a stray/partitioned
master would have less than (quorum - 1 standbys).
(May be possible to do it directly in the RETURNING clause, I don't have
a backend handy test that).


You can either look into the result at the client or force an error via
some bad cast/zero division in the query.

All the above is however still subject to (admittedly tight) race
conditions.


This problem is precisely why I don't use any of the off-the shelf
solutions: last time I checked none of that had a connection
proxy/router to direct clients to the real master and not a node that
thinks it is.



>
>> I personally prefer to designate a fixed synchronous set at setup time and automatically set a static
synchronous_standby_nameson the master whenever a failover occurs. That allows for a simpler failover mechanism as you
knowthey got the latest WAL location. 
> No, synchronous standby does not necessarily own latest WAL. It has WAL point no earlier than all commits
acknowledgedto client. 


You're right. I should have said "latest WAL holding an acknowledged
transaction"

>
> Thanks!
>
> Best regards, Andrey Borodin.
>

--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--


Notice:
This email is confidential and may contain copyright material of
members of the Ocado Group. Opinions and views expressed in this message
may not necessarily reflect the opinions and views of the members of the
Ocado Group.

If you are not the intended recipient, please notify us
immediately and delete all copies of this message. Please note that it is
your responsibility to scan this message for viruses.

References to the
"Ocado Group" are to Ocado Group plc (registered in England and Wales with
number 7098618) and its subsidiary undertakings (as that expression is
defined in the Companies Act 2006) from time to time. The registered office
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way,
Hatfield, Hertfordshire, AL10 9UL.



Re: Commit to primary with unavailable sync standby

От
Maksim Milyutin
Дата:
On 19.12.2019 14:04, Andrey Borodin wrote:

> Hi!


Hi!

FYI, this topic was up recently in -hackers 
https://www.postgresql.org/message-id/CAEET0ZHG5oFF7iEcbY6TZadh1mosLmfz1HLm311P9VOt7Z+jeg@mail.gmail.com


> I cannot figure out proper way to implement safe HA upsert. I will be very grateful if someone would help me.
>
> Imagine we have primary server after failover. It is network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING;
thateventually timed out.
 
>
> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>      INSERT INTO t(
>          pk,
>          v,
>          dt
>      )
>      VALUES
>      (
>          5,
>          'text',
>          now()
>      )
>      ON CONFLICT (pk) DO NOTHING
>      RETURNING pk,
>                v,
>                dt)
>     SELECT new_doc.pk from new_doc;
> ^CCancel request sent
> WARNING:  01000: canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
> LOCATION:  SyncRepWaitForLSN, syncrep.c:264
> Time: 2173.770 ms (00:02.174)
>
> Here our driver decided that something goes wrong and we retry query.
>
> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>      INSERT INTO t(
>          pk,
>          v,
>          dt
>      )
>      VALUES
>      (
>          5,
>          'text',
>          now()
>      )
>      ON CONFLICT (pk) DO NOTHING
>      RETURNING pk,
>                v,
>                dt)
>     SELECT new_doc.pk from new_doc;
>   pk
> ----
> (0 rows)
>
> Time: 4.785 ms
>
> Now we have split-brain, because we acknowledged that row to client.
> How can I fix this?
>
> There must be some obvious trick, but I cannot see it... Or maybe cancel of sync replication should be disallowed and
terminationshould be treated as system failure?
 
>

I think the most appropriate way to handle such issues is to catch by 
client driver such warnings (with message about local commit) and mark 
the status of posted transaction as undetermined. If connection with 
sync replica will come back then this transaction eventually commits but 
after triggering of autofailover and *not replicating this commit to 
replica* this commit aborts. Therefore client have to wait some time 
(that exceeds the duration of autofailover) and check (logically based 
on committed data) the status of commit.

The problem here is the locally committed data becomes visible to future 
transactions (before autofailover) that violates the property of 
consistent reading from master. IMO the more correct behavior for 
PostgreSQL here is to ignore any cancel / termination queries when 
backend is in status of waiting response from sync replicas.

However, there is another way to get locally applied commits via restart 
of master after initial recovery. This case is described in doc 
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA 
. But here HA orchestrator agent can close access from external users 
(via pg_hba.conf manipulations) until PostgreSQL instance synchronizes 
its changes with all sync replicas as it's implemented in Stolon 

https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

.


Best regards,
Maksim Milyutin




Re: Commit to primary with unavailable sync standby

От
Fabio Ugo Venchiarutti
Дата:

On 19/12/2019 13:58, Maksim Milyutin wrote:
> On 19.12.2019 14:04, Andrey Borodin wrote:
>
>> Hi!
>
>
> Hi!
>
> FYI, this topic was up recently in -hackers
> https://www.postgresql.org/message-id/CAEET0ZHG5oFF7iEcbY6TZadh1mosLmfz1HLm311P9VOt7Z+jeg@mail.gmail.com
>
>
>
>> I cannot figure out proper way to implement safe HA upsert. I will be
>> very grateful if someone would help me.
>>
>> Imagine we have primary server after failover. It is
>> network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING; that
>> eventually timed out.
>>
>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>      INSERT INTO t(
>>          pk,
>>          v,
>>          dt
>>      )
>>      VALUES
>>      (
>>          5,
>>          'text',
>>          now()
>>      )
>>      ON CONFLICT (pk) DO NOTHING
>>      RETURNING pk,
>>                v,
>>                dt)
>>     SELECT new_doc.pk from new_doc;
>> ^CCancel request sent
>> WARNING:  01000: canceling wait for synchronous replication due to
>> user request
>> DETAIL:  The transaction has already committed locally, but might not
>> have been replicated to the standby.
>> LOCATION:  SyncRepWaitForLSN, syncrep.c:264
>> Time: 2173.770 ms (00:02.174)
>>
>> Here our driver decided that something goes wrong and we retry query.
>>
>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>      INSERT INTO t(
>>          pk,
>>          v,
>>          dt
>>      )
>>      VALUES
>>      (
>>          5,
>>          'text',
>>          now()
>>      )
>>      ON CONFLICT (pk) DO NOTHING
>>      RETURNING pk,
>>                v,
>>                dt)
>>     SELECT new_doc.pk from new_doc;
>>   pk
>> ----
>> (0 rows)
>>
>> Time: 4.785 ms
>>
>> Now we have split-brain, because we acknowledged that row to client.
>> How can I fix this?
>>
>> There must be some obvious trick, but I cannot see it... Or maybe
>> cancel of sync replication should be disallowed and termination should
>> be treated as system failure?
>>
>
> I think the most appropriate way to handle such issues is to catch by
> client driver such warnings (with message about local commit) and mark
> the status of posted transaction as undetermined. If connection with
> sync replica will come back then this transaction eventually commits but
> after triggering of autofailover and *not replicating this commit to
> replica* this commit aborts. Therefore client have to wait some time
> (that exceeds the duration of autofailover) and check (logically based
> on committed data) the status of commit.
>
> The problem here is the locally committed data becomes visible to future
> transactions (before autofailover) that violates the property of
> consistent reading from master. IMO the more correct behavior for
> PostgreSQL here is to ignore any cancel / termination queries when
> backend is in status of waiting response from sync replicas.
>
> However, there is another way to get locally applied commits via restart
> of master after initial recovery. This case is described in doc
> https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA
> . But here HA orchestrator agent can close access from external users
> (via pg_hba.conf manipulations) until PostgreSQL instance synchronizes


And this is where the unsafety lies: that assumes that the isolated
master is in enough of a sane state to apply a self-ban (and that can do
it in near-zero time).


Although the retry logic in Andrey's case is probably not ideal (and you
offered a more correct approach to synchronous commit), there are many
"grey area" failure modes that in his scenario would either prevent a
given node from sealing up fast enuogh if at all (eg: PID congestion
causing fork()/system() to fail while backends are already up and
happily flushing WAL).


This is particularly relevant to situations when only a subset of
critical transactions set synchronous_commit to remote_*: it'd still be
undesirable to sink "tier 2" data in a stale primary for any significant
length of time).


Distributed systems like Etcd and Cassandra have a notion of
"coordination node" in the context of a request (not having to deal with
an "authoritative" transaction makes it easier).


In the case of postgres (or any RDBMS, really), all I can think of is
either an inline proxy performing some validation as part of the
forwarding (which is what we did internally but that has not been green
lit for FOSS :( ) or some logic in the backend that rejects asynchronous
commits too if some condition is not met (eg: <quorum - 1> synchronous
standby nodes not present - a builtin version of the pg_stat_replication
look-aside CTE I suggested earlier).





> its changes with all sync replicas as it's implemented in Stolon
>
https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

> .
>
>
> Best regards,
> Maksim Milyutin
>
>



--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--


Notice:
This email is confidential and may contain copyright material of
members of the Ocado Group. Opinions and views expressed in this message
may not necessarily reflect the opinions and views of the members of the
Ocado Group.

If you are not the intended recipient, please notify us
immediately and delete all copies of this message. Please note that it is
your responsibility to scan this message for viruses.

References to the
"Ocado Group" are to Ocado Group plc (registered in England and Wales with
number 7098618) and its subsidiary undertakings (as that expression is
defined in the Companies Act 2006) from time to time. The registered office
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way,
Hatfield, Hertfordshire, AL10 9UL.



Re: Commit to primary with unavailable sync standby

От
Maksim Milyutin
Дата:
On 19.12.2019 18:08, Fabio Ugo Venchiarutti wrote:
>
>
> On 19/12/2019 13:58, Maksim Milyutin wrote:
>> On 19.12.2019 14:04, Andrey Borodin wrote:
>>
>>> Hi!
>>
>>
>> Hi!
>>
>> FYI, this topic was up recently in -hackers 
>> https://www.postgresql.org/message-id/CAEET0ZHG5oFF7iEcbY6TZadh1mosLmfz1HLm311P9VOt7Z+jeg@mail.gmail.com 
>>
>>
>>
>>> I cannot figure out proper way to implement safe HA upsert. I will 
>>> be very grateful if someone would help me.
>>>
>>> Imagine we have primary server after failover. It is 
>>> network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING; 
>>> that eventually timed out.
>>>
>>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>>      INSERT INTO t(
>>>          pk,
>>>          v,
>>>          dt
>>>      )
>>>      VALUES
>>>      (
>>>          5,
>>>          'text',
>>>          now()
>>>      )
>>>      ON CONFLICT (pk) DO NOTHING
>>>      RETURNING pk,
>>>                v,
>>>                dt)
>>>     SELECT new_doc.pk from new_doc;
>>> ^CCancel request sent
>>> WARNING:  01000: canceling wait for synchronous replication due to 
>>> user request
>>> DETAIL:  The transaction has already committed locally, but might 
>>> not have been replicated to the standby.
>>> LOCATION:  SyncRepWaitForLSN, syncrep.c:264
>>> Time: 2173.770 ms (00:02.174)
>>>
>>> Here our driver decided that something goes wrong and we retry query.
>>>
>>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>>      INSERT INTO t(
>>>          pk,
>>>          v,
>>>          dt
>>>      )
>>>      VALUES
>>>      (
>>>          5,
>>>          'text',
>>>          now()
>>>      )
>>>      ON CONFLICT (pk) DO NOTHING
>>>      RETURNING pk,
>>>                v,
>>>                dt)
>>>     SELECT new_doc.pk from new_doc;
>>>   pk
>>> ----
>>> (0 rows)
>>>
>>> Time: 4.785 ms
>>>
>>> Now we have split-brain, because we acknowledged that row to client.
>>> How can I fix this?
>>>
>>> There must be some obvious trick, but I cannot see it... Or maybe 
>>> cancel of sync replication should be disallowed and termination 
>>> should be treated as system failure?
>>>
>>
>> I think the most appropriate way to handle such issues is to catch by 
>> client driver such warnings (with message about local commit) and 
>> mark the status of posted transaction as undetermined. If connection 
>> with sync replica will come back then this transaction eventually 
>> commits but after triggering of autofailover and *not replicating 
>> this commit to replica* this commit aborts. Therefore client have to 
>> wait some time (that exceeds the duration of autofailover) and check 
>> (logically based on committed data) the status of commit.
>>
>> The problem here is the locally committed data becomes visible to 
>> future transactions (before autofailover) that violates the property 
>> of consistent reading from master. IMO the more correct behavior for 
>> PostgreSQL here is to ignore any cancel / termination queries when 
>> backend is in status of waiting response from sync replicas.
>>
>> However, there is another way to get locally applied commits via 
>> restart of master after initial recovery. This case is described in 
>> doc 
>> https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA 
>> . But here HA orchestrator agent can close access from external users 
>> (via pg_hba.conf manipulations) until PostgreSQL instance synchronizes
>
>
> And this is where the unsafety lies: that assumes that the isolated 
> master is in enough of a sane state to apply a self-ban (and that can 
> do it in near-zero time).
>
>
> Although the retry logic in Andrey's case is probably not ideal (and 
> you offered a more correct approach to synchronous commit), there are 
> many "grey area" failure modes that in his scenario would either 
> prevent a given node from sealing up fast enuogh if at all (eg: PID 
> congestion causing fork()/system() to fail while backends are already 
> up and happily flushing WAL).
>
>
> This is particularly relevant to situations when only a subset of 
> critical transactions set synchronous_commit to remote_*: it'd still 
> be undesirable to sink "tier 2" data in a stale primary for any 
> significant length of time).


Could you more concrete describe your thesis? In my proposal the 
self-ban to master is applied after restarting one so that changes from 
locally committed transactions was not visible for new incoming 
transactions.


> In the case of postgres (or any RDBMS, really), all I can think of is 
> either an inline proxy performing some validation as part of the 
> forwarding (which is what we did internally but that has not been 
> green lit for FOSS :( )


External validation unfortunately is not option here. AIMB the local 
commits become visible to future transactions coming to master and even 
if some proxy reports to client that transaction is not committed 
completely, new incoming transactions reading locally applied changes 
and making its changes based on these ones implicitly confirms the 
status of these changes as committed.


> or some logic in the backend that rejects asynchronous commits too if 
> some condition is not met (eg: <quorum - 1> synchronous standby nodes 
> not present - a builtin version of the pg_stat_replication look-aside 
> CTE I suggested earlier).


CTE with sub-query using pg_stat_replication is not option too. The view 
pg_stat_replication is in fact shows the stale info about statuses of 
replicas and is formed from statuses of wal_sender processes. That is 
when replica loses contact with master then at most wal_sender_timeout 
master will see this replica in pg_stat_replication without any changes 
of row attributes, so local commits also are capable to overslip inside 
this timeout.


-- 
Best regards,
Maksim Milyutin