Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Дата
Msg-id 4db099c8-4a52-3cc4-e970-14539a319466@gmail.com
обсуждение исходный текст
Ответ на Re: [EXTERNAL] Re: Add non-blocking version of PQcancel  (Noah Misch <noah@leadboat.com>)
Ответы Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Список pgsql-hackers
Hello hackers,

30.03.2024 01:17, Noah Misch wrote:
> On Fri, Mar 29, 2024 at 09:17:55AM +0100, Jelte Fennema-Nio wrote:
>> On Thu, 28 Mar 2024 at 19:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Could we make this test bulletproof by using an injection point?
>>> If not, I remain of the opinion that we're better off without it.
>> Possibly, and if so, I agree that would be better than the currently
>> added test. But I honestly don't feel like spending the time on
>> creating such a test.
> The SQL test is more representative of real applications, and it's way simpler
> to understand.  In general, I prefer 6-line SQL tests that catch a problem 10%
> of the time over injection point tests that catch it 100% of the time.  For
> low detection rate to be exciting, it needs to be low enough to have a serious
> chance of all buildfarm members reporting green for the bad commit.  With ~115
> buildfarm members running in the last day, 0.1% detection rate would have been
> low enough to bother improving, but 4% would be high enough to call it good.

As a recent buildfarm failure on orlingo (which tests asan-enabled builds)
[1] shows, that test can still fail:
70/70 postgresql:postgres_fdw-running / postgres_fdw-running/regress               ERROR           278.67s exit status
1

@@ -2775,6 +2775,7 @@
  SET LOCAL statement_timeout = '10ms';
  select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
  ERROR:  canceling statement due to statement timeout
+WARNING:  could not get result of cancel request due to timeout
  COMMIT;

(from the next run we can see normal duration:
"postgres_fdw-running/regress               OK                6.30s ")

I reproduced the failure with an asan-enabled build on a slowed-down VM
and as far as I can see, it's caused by the following condition in
ProcessInterrupts():
                 /*
                  * If we are reading a command from the client, just ignore the cancel
                  * request --- sending an extra error message won't accomplish
                  * anything.  Otherwise, go ahead and throw the error.
                  */
                 if (!DoingCommandRead)
                 {
                         LockErrorCleanup();
                         ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
                                          errmsg("canceling statement due to user request")));
                 }

I think this failure can be reproduced easily (without asan/slowing down)
with this modification:
@@ -4630,6 +4630,7 @@ PostgresMain(const char *dbname, const char *username)
              idle_session_timeout_enabled = false;
          }

+if (rand() % 10 == 0) pg_usleep(10000);
          /*
           * (5) disable async signal conditions again.
           *

Running this test in a loop (for ((i=1;i<=100;i++)); do \
echo "iteration $i"; make -s check -C contrib/postgres_fdw/ || break; \
done), I get:
...
iteration 56
# +++ regress check in contrib/postgres_fdw +++
# initializing database system by copying initdb template
# using temp instance on port 55312 with PID 991332
ok 1         - postgres_fdw                            20093 ms
1..1
# All 1 tests passed.
iteration 57
# +++ regress check in contrib/postgres_fdw +++
# initializing database system by copying initdb template
# using temp instance on port 55312 with PID 992152
not ok 1     - postgres_fdw                            62064 ms
1..1
...
--- .../contrib/postgres_fdw/expected/postgres_fdw.out 2024-06-22 02:52:42.991574907 +0000
+++ .../contrib/postgres_fdw/results/postgres_fdw.out 2024-06-22 14:43:43.949552927 +0000
@@ -2775,6 +2775,7 @@
  SET LOCAL statement_timeout = '10ms';
  select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
  ERROR:  canceling statement due to statement timeout
+WARNING:  could not get result of cancel request due to timeout
  COMMIT;

I also came across another failure of the test:
@@ -2774,7 +2774,7 @@
  BEGIN;
  SET LOCAL statement_timeout = '10ms';
  select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
-ERROR:  canceling statement due to statement timeout
+ERROR:  canceling statement due to user request
  COMMIT;

which is reproduced with a sleep added here:
@@ -1065,6 +1065,7 @@ exec_simple_query(const char *query_string)
       */
      parsetree_list = pg_parse_query(query_string);
+pg_usleep(11000);

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-06-20%2009%3A52%3A04

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: FreezeLimit underflows in pg14 and 15 causing incorrect behavior in heap_prepare_freeze_tuple
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: replace strtok()