Обсуждение: statement_timeout has no effect if sync standby is unavailable

Поиск
Список
Период
Сортировка

statement_timeout has no effect if sync standby is unavailable

От
Nikhil Shetty
Дата:
Hi Team,

I am writing a monitoring tool to find if the database is hung.

One of the scenarios is to stop the standby but keep 'synchronous_standby_names=standby1' and 'synchronous_commit=remote_apply' unchanged on primary.

This way the DML queries on primary will be hung. I tried to use 'statement_timeout=5s' to timeout the query but it does not work.

Is there another way to timeout query and report an error ? and why is statement_timeout not working?

Thanks,
Nikhil

Re: statement_timeout has no effect if sync standby is unavailable

От
Nikhil Shetty
Дата:
Hi Team,

We have also seen this scenario when we froze the data mount point and run a DML

Freeze mountpoint 
date;fsfreeze --freeze /postgres
The statement is just stuck even thought statement_timeout is set

datid        | 14175
datname      | postgres
pid          | 5789
leader_pid   | 
usesysid     | 10
usename      | postgres
application_name | hang_monitor
client_addr  | 127.0.0.1
client_hostname | 
client_port  | 37440
backend_start| 2023-11-01 10:05:51.090023+00
xact_start   | 2023-11-01 10:05:51.167816+00
query_start  | 2023-11-01 10:05:51.167816+00
state_change | 2023-11-01 10:05:51.167817+00
wait_event_type | IO
wait_event   | WALWrite
state        | active
backend_xid  | 43242183
backend_xmin | 
query        | update HEARTBEAT set last_updated_time = timezone('UTC', now()) where ID = 1;
backend_type | client backend

Thanks,
Nikhil



On Fri, Oct 27, 2023 at 3:43 PM Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi Team,

I am writing a monitoring tool to find if the database is hung.

One of the scenarios is to stop the standby but keep 'synchronous_standby_names=standby1' and 'synchronous_commit=remote_apply' unchanged on primary.

This way the DML queries on primary will be hung. I tried to use 'statement_timeout=5s' to timeout the query but it does not work.

Is there another way to timeout query and report an error ? and why is statement_timeout not working?

Thanks,
Nikhil

Re: statement_timeout has no effect if sync standby is unavailable

От
Laurenz Albe
Дата:
On Wed, 2023-11-01 at 15:44 +0530, Nikhil Shetty wrote:
> We have also seen this scenario when we froze the data mount point and run a DML
>
> Freeze mountpoint 
> date;fsfreeze --freeze /postgres
> The statement is just stuck even thought statement_timeout is set
>
> datid        | 14175
> datname      | postgres
> pid          | 5789
> leader_pid   |
> usesysid     | 10
> usename      | postgres
> application_name | hang_monitor
> client_addr  | 127.0.0.1
> client_hostname |
> client_port  | 37440
> backend_start| 2023-11-01 10:05:51.090023+00
> xact_start   | 2023-11-01 10:05:51.167816+00
> query_start  | 2023-11-01 10:05:51.167816+00
> state_change | 2023-11-01 10:05:51.167817+00
> wait_event_type | IO
> wait_event   | WALWrite
> state        | active
> backend_xid  | 43242183
> backend_xmin |
> query        | update HEARTBEAT set last_updated_time = timezone('UTC', now()) where ID = 1;
> backend_type | client backend

Check with "ps": perhaps the backend process is in uninterruptible sleep, waiting
for I/O.

Yours,
Laurenz Albe



Re: statement_timeout has no effect if sync standby is unavailable

От
Nikhil Shetty
Дата:
Hi Laurenz

I need to check the process state but statement_timeout should timeout such queries, no?

Thanks,
Nikhil

On Wed, 1 Nov 2023 at 22:09, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2023-11-01 at 15:44 +0530, Nikhil Shetty wrote:
> We have also seen this scenario when we froze the data mount point and run a DML
>
> Freeze mountpoint 
> date;fsfreeze --freeze /postgres
> The statement is just stuck even thought statement_timeout is set
>
> datid        | 14175
> datname      | postgres
> pid          | 5789
> leader_pid   |
> usesysid     | 10
> usename      | postgres
> application_name | hang_monitor
> client_addr  | 127.0.0.1
> client_hostname |
> client_port  | 37440
> backend_start| 2023-11-01 10:05:51.090023+00
> xact_start   | 2023-11-01 10:05:51.167816+00
> query_start  | 2023-11-01 10:05:51.167816+00
> state_change | 2023-11-01 10:05:51.167817+00
> wait_event_type | IO
> wait_event   | WALWrite
> state        | active
> backend_xid  | 43242183
> backend_xmin |
> query        | update HEARTBEAT set last_updated_time = timezone('UTC', now()) where ID = 1;
> backend_type | client backend

Check with "ps": perhaps the backend process is in uninterruptible sleep, waiting
for I/O.

Yours,
Laurenz Albe

Re: statement_timeout has no effect if sync standby is unavailable

От
Tom Lane
Дата:
Nikhil Shetty <nikhil.dba04@gmail.com> writes:
> I need to check the process state but statement_timeout should timeout such
> queries, no?

No, I don't think that should be the policy, and if it doesn't do so
now I'm content to leave it like that.  Once we have committed locally
and started to wait for a sync standby, we are between a rock and a
hard place: we can't back out the commit.  If we were to allow a
timeout error to occur, we'd have a choice of reporting that the
commit failed (a lie) or that it succeeded (also a lie, given that
the promise of sync commit is that we don't report commit until it's
persisted on the standby too).  Neither of these are preferable to
ignoring the timeout.

tl;dr: if your standby is not 100% reliable, enabling sync standby
is a poor choice.

            regards, tom lane



Re: statement_timeout has no effect if sync standby is unavailable

От
Nikhil Shetty
Дата:
Got it but in my second scenario there was no sync standby

We froze the data mount point and ran queries and all of them were hung. It is waiting for WalWrite - in this case, can it not timeout without committing?

We used statement_timeout of 2s for testing 

Thanks,
Nikhil

On Thu, 2 Nov 2023 at 21:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nikhil Shetty <nikhil.dba04@gmail.com> writes:
> I need to check the process state but statement_timeout should timeout such
> queries, no?

No, I don't think that should be the policy, and if it doesn't do so
now I'm content to leave it like that.  Once we have committed locally
and started to wait for a sync standby, we are between a rock and a
hard place: we can't back out the commit.  If we were to allow a
timeout error to occur, we'd have a choice of reporting that the
commit failed (a lie) or that it succeeded (also a lie, given that
the promise of sync commit is that we don't report commit until it's
persisted on the standby too).  Neither of these are preferable to
ignoring the timeout.

tl;dr: if your standby is not 100% reliable, enabling sync standby
is a poor choice.

                        regards, tom lane