WAL record CRC calculated incorrectly because of underlying buffer modification
От | Alexander Lakhin |
---|---|
Тема | WAL record CRC calculated incorrectly because of underlying buffer modification |
Дата | |
Msg-id | 055bb33c-bccc-bc1d-c2f8-8598534448ac@gmail.com обсуждение исходный текст |
Ответы |
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
|
Список | pgsql-hackers |
Hello hackers, I've investigated a recent buildfarm failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-05-02%2006%3A40%3A36 where the test failed due to a CRC error: 2024-05-02 17:08:18.401 ACST [3406:7] LOG: incorrect resource manager data checksum in record at 0/F14D7A60 (Chipmunk produced similar errors as well: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2022-08-25%2019%3A40%3A11 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2024-03-22%2003%3A20%3A39 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-08-19%2006%3A43%3A20 ) and discovered that XLogRecordAssemble() calculates CRC over a buffer, that might be modified by another process. With the attached patch applied, the following test run: echo " autovacuum_naptime = 1 autovacuum_vacuum_threshold = 1 wal_consistency_checking = all " >/tmp/temp.config for ((i=1;i<=100;i++)); do echo "iteration $i"; TEMP_CONFIG=/tmp/temp.config TESTS="test_setup hash_index" make check-tests -s || break; done fails for me on iterations 7, 10, 17: ok 1 - test_setup 2557 ms not ok 2 - hash_index 24719 ms # (test process exited with exit code 2) postmaster.log contains: 2024-05-10 12:46:44.320 UTC checkpointer[1881151] LOG: checkpoint starting: immediate force wait 2024-05-10 12:46:44.365 UTC checkpointer[1881151] LOG: checkpoint complete: wrote 41 buffers (0.3%); 0 WAL file(s) added, 0 removed, 26 recycled; write=0.001 s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, average=0.000 s; distance=439134 kB, estimate=527137 kB; lsn=0/3CE131F0, redo lsn=0/3CE13198 TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: "xloginsert.c", Line: 949, PID: 1881271 ExceptionalCondition at assert.c:52:13 XLogRecordAssemble at xloginsert.c:953:1 XLogInsert at xloginsert.c:520:9 hashbucketcleanup at hash.c:844:14 hashbulkdelete at hash.c:558:3 index_bulk_delete at indexam.c:760:1 vac_bulkdel_one_index at vacuum.c:2498:10 lazy_vacuum_one_index at vacuumlazy.c:2443:10 lazy_vacuum_all_indexes at vacuumlazy.c:2026:26 lazy_vacuum at vacuumlazy.c:1944:10 lazy_scan_heap at vacuumlazy.c:1050:3 heap_vacuum_rel at vacuumlazy.c:503:2 vacuum_rel at vacuum.c:2214:2 vacuum at vacuum.c:622:8 autovacuum_do_vac_analyze at autovacuum.c:3102:2 do_autovacuum at autovacuum.c:2425:23 AutoVacWorkerMain at autovacuum.c:1569:3 pgarch_die at pgarch.c:846:1 StartChildProcess at postmaster.c:3929:5 StartAutovacuumWorker at postmaster.c:3997:12 process_pm_pmsignal at postmaster.c:3809:3 ServerLoop at postmaster.c:1667:5 BackgroundWorkerInitializeConnection at postmaster.c:4156:1 main at main.c:184:3 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc71a8d7e40] postgres: autovacuum worker regression(_start+0x25)[0x556a8631a5e5] 2024-05-10 12:46:45.038 UTC checkpointer[1881151] LOG: checkpoint starting: immediate force wait 2024-05-10 12:46:45.965 UTC autovacuum worker[1881275] LOG: automatic analyze of table "regression.pg_catalog.pg_attribute" avg read rate: 0.000 MB/s, avg write rate: 5.409 MB/s buffer usage: 1094 hits, 0 misses, 27 dirtied system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.03 s 2024-05-10 12:46:46.892 UTC postmaster[1881150] LOG: server process (PID 1881271) was terminated by signal 6: Aborted 2024-05-10 12:46:46.892 UTC postmaster[1881150] DETAIL: Failed process was running: autovacuum: VACUUM ANALYZE public.hash_cleanup_heap (This can be reproduced with 027_stream_regress, of course, but it would take more time.) Best regards, Alexander
Вложения
В списке pgsql-hackers по дате отправления: