Обсуждение: SynchRep; wait-forever and shutdown

Поиск
Список
Период
Сортировка

SynchRep; wait-forever and shutdown

От
Fujii Masao
Дата:
Hi,

In previous discussion, some people wanted the "wait-forever" option which
blocks all the transactions on the master until sync'd standby has appeared,
in order to reduce the risk of data loss in synchronous replication.

What I'm not clear is; How does smart or fast shudown advance while all the
transactions are being blocked?

1. Shutdown should wait for all the transactions to end by appearance of    sync'd standby?    * Problem is that
shutdownwould take very long.
 

2. Shutdown should commit all the blocking transactions?    * Problem is that a client thinks that those transactions
havesuccessfully       been committed even though they have not been replicated to the       standby.
 

3. Shutdown should abort all the blocking transactions?    * Problem is that a client thinks that those transactions
havebeen aborted       even though those WAL records have been written on the master. But       this is very common
problemfor DBMS, so we don't need to worry about       this in the context of replication.
 

ISTM smart and fast shutdown fits in with #1 and #3, respectively. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: SynchRep; wait-forever and shutdown

От
Josh Berkus
Дата:
> 3. Shutdown should abort all the blocking transactions?
>       * Problem is that a client thinks that those transactions have been aborted
>          even though those WAL records have been written on the master. But
>          this is very common problem for DBMS, so we don't need to worry about
>          this in the context of replication.

Hmmm.  The WAL records are written as commited ... this is why people 
get into 2PC if they want full synchrnous.  Short of using 2PC, there is 
simply no way we can guarentee that the master and the standby won't get 
out of sync.  And even 2PC isn't perfect.

I think the best we can do is have the master abort the sessions and 
shutdown for a -fast.  Yes, the clients are confused about what's been 
committed, but frequently that's the case with a -fast anyway.

However, we need to give the user more information.  I'd say that we 
need to have a specific error message associated with a synchronization 
failure around shutdown time.  This error should be both returned to the 
clients, and logged.  That way the DBA can decide what to do about the 
error, if anything.

So, I'd say this is the way to go:
Shutdown Smart:Wait for all pending standby transaction to clear.After 60 seconds, emit an error message on the
shutdownconsole:    NOTICE: pending replication transactions still waiting... that way the DBA knows to move on to
-fast

Shutdown Fast:Wait for 1 second for all pending standby transactions to clear.If they don't clear, emit an error to
boththe shutdown consoleand the client consoles:WARNING: some transactions not replicatedSend a commit message on the
clientconsolesShutdown.
 





--                                   -- Josh Berkus                                     PostgreSQL Experts Inc.
                           http://www.pgexperts.com
 


Re: SynchRep; wait-forever and shutdown

От
Robert Haas
Дата:
On Thu, Dec 9, 2010 at 11:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> In previous discussion, some people wanted the "wait-forever" option which
> blocks all the transactions on the master until sync'd standby has appeared,
> in order to reduce the risk of data loss in synchronous replication.
>
> What I'm not clear is; How does smart or fast shudown advance while all the
> transactions are being blocked?
>
> 1. Shutdown should wait for all the transactions to end by appearance of
>     sync'd standby?
>     * Problem is that shutdown would take very long.
>
> 2. Shutdown should commit all the blocking transactions?
>     * Problem is that a client thinks that those transactions have successfully
>        been committed even though they have not been replicated to the
>        standby.
>
> 3. Shutdown should abort all the blocking transactions?
>     * Problem is that a client thinks that those transactions have been aborted
>        even though those WAL records have been written on the master. But
>        this is very common problem for DBMS, so we don't need to worry about
>        this in the context of replication.
>
> ISTM smart and fast shutdown fits in with #1 and #3, respectively. Thought?

I might be missing something, but I don't see why this case requires
any special handling.  As far as I can see, #2 and #3 are nonsense:
the client isn't waiting on the commit per se, but rather the
acknowledgment of the commit.  In a smart shutdown, we wait for all
clients to disconnect.  If they never disconnect, we never shut down.
It's a lame behavior and we might want to change it some day - at
least by adding a timeout - but I don't see any reason to change it
because of synchronous replication per se.  In a fast shutdown, we
boot all clients off immediately.  If they were waiting for an
acknowledgment, they don't get it.  The application has to handle this
case, just as it does today if it sends a COMMIT command and the
connection is disconnected before it receives a response.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company