Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Дата
Msg-id 5cbe0b03-d6f3-501d-3849-534568b0e776@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-bugs
Hi Robert,

05.04.2024 23:20, Robert Haas wrote:
> On Fri, Oct 29, 2021 at 9:30 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>> I can propose the debugging patch to reproduce the issue that replaces
>> the hang with the assert and modifies a pair of crash-causing test
>> scripts to simplify the reproducing. (Sorry, I have no time now to prune
>> down the scripts further as I have to leave for a week.)
> Just FYI, I tried to reproduce this today on v16, using this formula,
> with some hacking around to try to get it working on my MacBook, and I
> couldn't get it to crash.

I've refreshed the script and simplified it a bit not to use Linux
specifics. This works for me (on REL_14_0, with the patch applied,
CPPFLAGS="-O0" ./configure --enable-debug --enable-cassert ...):
echo "
autovacuum=off
fsync=off
" >> "$PGDATA/postgresql.conf"

pg_ctl -w -l server.log start

export PGDATABASE=regression
createdb regression

echo "
vacuum (verbose, skip_locked, index_cleanup off) pg_catalog.pg_class;
select pg_sleep(random()/50);
" >/tmp/17257/pseudo-autovacuum.sql

export PGDATABASE=regression
createdb regression
pgbench -n -f /tmp/17257/inherit.sql -C -T 1200 >pgbench-1.log 2>&1 &
pgbench -n -f /tmp/17257/vacuum.sql -C -T 1200 >pgbench-2.log 2>&1 &
pgbench -n -f /tmp/17257/pseudo-autovacuum.sql -C -c 10 -T 1200 >pgbench-3.log 2>&1 &
wait
grep -E "(TRAP|terminated)" server.log

(Please use the attached inherit.sql, vacuum.sql (excerpts from
src/test/sql/{inherit,vacuum}.sql).)

With PGDATA placed on tmpfs, this script failed for me after 1m31s,
2m35s, 4m12s:
TRAP: FailedAssertion("numretries < 100", File: "vacuumlazy.c", Line: 1726, PID: 951498)

Another possible outcome:
TRAP: FailedAssertion("relid == targetRelId", File: "relcache.c", Line: 1062, PID: 1257766)

And also:
2024-04-07 05:03:21.656 UTC [2905313] LOG:  server process (PID 2984687) was terminated by signal 6: Aborted
2024-04-07 05:03:21.656 UTC [2905313] DETAIL:  Failed process was running: create table matest0 (id serial primary key,

name text);
With the stack trace:
...
#4  0x00007fc30b4007f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000559f50220719 in index_delete_sort_cmp (deltid1=0x559f523a9f40, deltid2=0x7ffd2f9623f8) at heapam.c:7582
#6  0x0000559f50220847 in index_delete_sort (delstate=0x7ffd2f9636f0) at heapam.c:7623
...
(as in [1])

But on dad1539ae I got no failures for 3 runs (the same is on
REL_16_STABLE with a slightly modified lazy_scan_prune patch).

[1] https://www.postgresql.org/message-id/17255-14c0ac58d0f9b583%40postgresql.org

Best regards,
Alexander
Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tender Wang
Дата:
Сообщение: Re: BUG #18422: Assert in expandTupleDesc() fails on row mismatch with additional SRF
Следующее
От: Tender Wang
Дата:
Сообщение: Re: Detach Partition produces a --> SQL-Fehler [XX000]: ERROR: could not find ON INSERT check triggers of foreign key constraint 76908