BUG #17512: Process running query fails with SIGSEV - nodeMemoize.c:349

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17512: Process running query fails with SIGSEV - nodeMemoize.c:349
Дата
Msg-id 17512-36fa5fab90f48af9@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17512: Process running query fails with SIGSEV - nodeMemoize.c:349  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17512
Logged by:          Aleš Zelený
Email address:      zeleny.ales@gmail.com
PostgreSQL version: 14.2
Operating system:   CentOS 7
Description:

Version information:
OS: CentOS Linux release 7.9.2009 (Core)
PostgresSQL (RPM installation): PostgreSQL 14.2 on x86_64-pc-linux-gnu,
compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

PostgreSQL Log:
2022-06-06 21:23:21.147 UTC [1647] 629e7049.66f, 1, 0, 0, , [local],
[unknown], [unknown], [unknown], 00000,  LOG:  connection received:
host=[local]
2022-06-06 21:23:21.148 UTC [1647] 629e7049.66f, 2, 0, 0, 5/7, [local],
sdp_prod, zelenya, [unknown], 00000, authentication LOG:  connection
authenticated: identity="zelenya" method=peer (/pgsql/sdp/cluster/14
/pg_hba.conf:85)
2022-06-06 21:23:21.148 UTC [1647] 629e7049.66f, 3, 0, 0, 5/7, [local],
sdp_prod, zelenya, [unknown], 00000, authentication LOG:  connection
authorized: user=zelenya database=sdp_prod application_name=psql

2022-06-06 21:24:00.695 UTC [1124] 629e7004.464, 22, 0, 0, , , , , , 00000,
LOG:  server process (PID 1647) was terminated by signal 11: Segmentation
fault
2022-06-06 21:24:00.695 UTC [1124] 629e7004.464, 23, 0, 0, , , , , , 00000,
DETAIL:  Failed process was running: explain (analyze, buffers)

2022-06-06 21:24:00.695 UTC [1124] 629e7004.464, 24, 0, 0, , , , , , 00000,
LOG:  terminating any other active server processes
2022-06-06 21:24:00.806 UTC [1124] 629e7004.464, 25, 0, 0, , , , , , 00000,
LOG:  all server processes terminated; reinitializing
2022-06-06 21:24:01.912 UTC [1655] 629e7071.677, 1, 0, 0, , , , , , 00000,
LOG:  database system was interrupted; last known up at 2022-06-06 21:22:12
UTC
2022-06-06 21:24:02.003 UTC [1655] 629e7071.677, 2, 0, 0, , , , , , 42501,
LOG:  could not open directory "pg_tblspc/140428/lost+found": Permission
denied
2022-06-06 21:24:02.021 UTC [1655] 629e7071.677, 3, 0, 0, , , , , , 00000,
LOG:  database system was not properly shut down; automatic recovery in
progress
2022-06-06 21:24:02.028 UTC [1655] 629e7071.677, 4, 0, 0, , , , , , 00000,
LOG:  redo starts at 121C/190000A0
2022-06-06 21:24:02.028 UTC [1655] 629e7071.677, 5, 0, 0, , , , , , 00000,
LOG:  invalid record length at 121C/190000D8: wanted 24, got 0
2022-06-06 21:24:02.028 UTC [1655] 629e7071.677, 6, 0, 0, , , , , , 00000,
LOG:  redo done at 121C/190000A0 system usage: CPU: user: 0.00 s, system:
0.00 s, elapsed: 0.00 s
2022-06-06 21:24:02.039 UTC [1655] 629e7071.677, 7, 0, 0, , , , , , 00000,
LOG:  checkpoint starting: end-of-recovery immediate
2022-06-06 21:24:02.208 UTC [1655] 629e7071.677, 8, 0, 0, , , , , , 00000,
LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1
removed, 0 recycled; write=0.153 s, sync=0.001 s, total=0.1
72 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB,
estimate=0 kB
2022-06-06 21:24:02.227 UTC [1124] 629e7004.464, 26, 0, 0, , , , , , 00000,
LOG:  database system is ready to accept connections

The process crash with SIGSEV is reproducible on production, core dump was
enabled to get more details:


-bash-4.2$ gdb /usr/pgsql-14/bin/postgres
core.postmaster.19645.ip-172-31-1-64.us-east-2.compute.internal.1654522585
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/pgsql-14/bin/postgres...Reading symbols from
/usr/lib/debug/usr/pgsql-14/bin/postgres.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 19645]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: sdp01: zelenya sdp_prod [local] EXPLAIN
'.
Program terminated with signal 11, Segmentation fault.
#0  remove_cache_entry (entry=<optimized out>, mstate=<optimized out>) at
nodeMemoize.c:349
349    nodeMemoize.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install
audit-libs-2.8.5-4.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
cyrus-sasl-lib-2.1.26-24.el7_9.x86_64 elfutils-libelf-0.176-5.el7.x86_64
elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64
libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64
libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64
libedit-3.0-12.20121213cvs.el7.x86_64 libffi-3.0.13-19.el7.x86_64
libgcc-4.8.5-44.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64
libgpg-error-1.12-3.el7.x86_64 libicu-50.2-4.el7_7.x86_64
libselinux-2.5-15.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
libxml2-2.9.1-6.el7_9.6.x86_64 llvm5.0-libs-5.0.1-7.el7.x86_64
lz4-1.8.3-1.el7.x86_64 ncurses-libs-5.9-14.20130511.el7_4.x86_64
nspr-4.32.0-1.el7_9.x86_64 nss-3.67.0-4.el7_9.x86_64
nss-softokn-freebl-3.67.0-3.el7_9.x86_64 nss-util-3.67.0-1.el7_9.x86_64
openldap-2.4.44-25.el7_9.x86_64 openssl-libs-1.0.2k-24.el7_9.x86_64
pam-1.1.8-23.el7.x86_64 pcre-8.32-17.el7.x86_64
pg_qualstats_14-2.0.4-1.rhel7.x86_64 pg_stat_kcache_14-2.2.0-2.rhel7.x86_64
pg_wait_sampling_14-1.1.3-1.rhel7.x86_64 powa_14-4.1.4-1.rhel7.x86_64
systemd-libs-219-78.el7_9.5.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-19.el7_9.x86_64

(gdb) bt
#0  remove_cache_entry (entry=<optimized out>, mstate=<optimized out>) at
nodeMemoize.c:349
#1  cache_reduce_memory (mstate=mstate@entry=0x2d0a2a0,
specialkey=specialkey@entry=0x122614e8) at nodeMemoize.c:468
#2  0x0000000000669cce in cache_store_tuple (mstate=mstate@entry=0x2d0a2a0,
slot=slot@entry=0x2d0a7e0) at nodeMemoize.c:635
#3  0x000000000066a0ab in ExecMemoize (pstate=0x2d0a2a0) at
nodeMemoize.c:850
#4  0x0000000000649b28 in ExecProcNodeInstr (node=0x2d0a2a0) at
execProcnode.c:479
#5  0x00000000006710f5 in ExecProcNode (node=0x2d0a2a0) at
../../../src/include/executor/executor.h:257
#6  ExecNestLoop (pstate=0x2c70a38) at nodeNestloop.c:160
#7  0x0000000000649b28 in ExecProcNodeInstr (node=0x2c70a38) at
execProcnode.c:479
#8  0x00000000006735a6 in ExecProcNode (node=0x2c70a38) at
../../../src/include/executor/executor.h:257
#9  ExecSort (pstate=0x2c70828) at nodeSort.c:108
#10 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c70828) at
execProcnode.c:479
#11 0x0000000000653b19 in ExecProcNode (node=0x2c70828) at
../../../src/include/executor/executor.h:257
#12 fetch_input_tuple (aggstate=aggstate@entry=0x2c701c0) at nodeAgg.c:581
#13 0x00000000006568f2 in agg_retrieve_direct (aggstate=0x2c701c0) at
nodeAgg.c:2364
#14 ExecAgg (pstate=0x2c701c0) at nodeAgg.c:2179
#15 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c701c0) at
execProcnode.c:479
#16 0x0000000000670fe9 in ExecProcNode (node=0x2c701c0) at
../../../src/include/executor/executor.h:257
#17 ExecNestLoop (pstate=0x2c70020) at nodeNestloop.c:109
#18 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c70020) at
execProcnode.c:479
#19 0x00000000006735a6 in ExecProcNode (node=0x2c70020) at
../../../src/include/executor/executor.h:257
#20 ExecSort (pstate=0x2c6fe10) at nodeSort.c:108
#21 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c6fe10) at
execProcnode.c:479
#22 0x0000000000653b19 in ExecProcNode (node=0x2c6fe10) at
../../../src/include/executor/executor.h:257
#23 fetch_input_tuple (aggstate=aggstate@entry=0x2c6f838) at nodeAgg.c:581
#24 0x00000000006568f2 in agg_retrieve_direct (aggstate=0x2c6f838) at
nodeAgg.c:2364
#25 ExecAgg (pstate=0x2c6f838) at nodeAgg.c:2179
#26 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c6f838) at
execProcnode.c:479
#27 0x00000000006683b3 in ExecProcNode (node=0x2c6f838) at
../../../src/include/executor/executor.h:257
#28 ExecLimit (pstate=0x2c6f548) at nodeLimit.c:96
#29 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c6f548) at
execProcnode.c:479
#30 0x0000000000643152 in ExecProcNode (node=0x2c6f548) at
../../../src/include/executor/executor.h:257
#31 ExecutePlan (execute_once=<optimized out>, dest=0xa40d00 <donothingDR>,
direction=<optimized out>, numberTuples=0, sendTuples=true,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x2c6f548, estate=0x2c6f018) at execMain.c:1551
#32 standard_ExecutorRun (queryDesc=0x2ea0188, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:361
#33 0x00007f06ba3e461d in pgss_ExecutorRun (queryDesc=0x2ea0188,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
pg_stat_statements.c:1003
#34 0x00007f06b9fd44c1 in pgsk_ExecutorRun () from
/usr/pgsql-14/lib/pg_stat_kcache.so
#35 0x00007f06b9dcbfd1 in pgqs_ExecutorRun () from
/usr/pgsql-14/lib/pg_qualstats.so
#36 0x00000000005d8842 in ExplainOnePlan
(plannedstmt=plannedstmt@entry=0x2e9cbe8, into=into@entry=0x0,
es=es@entry=0x342d228,
    queryString=queryString@entry=0x2c72f58 "explain (analyze, buffers)\n",
' ' <repeats 24 times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>,
"SELECT\n", ' ' <repeats 33 times>, "\n", ' ' <repeats 32 times>,
"MAX(dmd_current0.split_adj_"...,
    params=params@entry=0x0, queryEnv=<optimized out>,
planduration=planduration@entry=0x7ffc3d75ca60, bufusage=0x7ffc3d75cae0) at
explain.c:593
#37 0x00000000005d8bdd in ExplainOneQuery (query=<optimized out>,
cursorOptions=<optimized out>, into=0x0, es=0x342d228,
    queryString=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24
times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' '
<repeats 33 times>, "\n", ' ' <repeats 32 times>,
"MAX(dmd_current0.split_adj_"..., params=0x0, queryEnv=0x0)
    at explain.c:410
#38 0x00000000005d9285 in ExplainQuery (pstate=pstate@entry=0x2c77230,
stmt=stmt@entry=0x319b468, params=params@entry=0x0,
dest=dest@entry=0x2c771a0) at explain.c:281
#39 0x00000000007bc575 in standard_ProcessUtility
(pstmt=pstmt@entry=0x2e296b0,
    queryString=queryString@entry=0x2c72f58 "explain (analyze, buffers)\n",
' ' <repeats 24 times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>,
"SELECT\n", ' ' <repeats 33 times>, "\n", ' ' <repeats 32 times>,
"MAX(dmd_current0.split_adj_"...,
    readOnlyTree=<optimized out>,
context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0,
queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x2c771a0,
qc=qc@entry=0x7ffc3d75cfb0) at utility.c:862
#40 0x00007f06ba3e60a0 in pgss_ProcessUtility (pstmt=0x2e296b0,
    queryString=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24
times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' '
<repeats 33 times>, "\n", ' ' <repeats 32 times>,
"MAX(dmd_current0.split_adj_"..., readOnlyTree=<optimized out>,
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x2c771a0, qc=0x7ffc3d75cfb0) at pg_stat_statements.c:1135
#41 0x00000000007ba7aa in PortalRunUtility (portal=portal@entry=0x2bef9d8,
pstmt=0x2e296b0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x2c771a0,
qc=qc@entry=0x7ffc3d75cfb0) at pquery.c:1155
#42 0x00000000007bab11 in FillPortalStore (portal=portal@entry=0x2bef9d8,
isTopLevel=isTopLevel@entry=true) at pquery.c:1028
#43 0x00000000007bae8b in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x2e29790, altdest=0x2e29790, qc=0x7ffc3d75d190) at
pquery.c:760
#44 0x00000000007b6be7 in exec_simple_query (
    query_string=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24
times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' '
<repeats 33 times>, "\n", ' ' <repeats 32 times>,
"MAX(dmd_current0.split_adj_"...) at postgres.c:1214
#45 0x00000000007b8063 in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffc3d75d560, dbname=0x2bb78e0 "sdp_prod",
username=<optimized out>) at postgres.c:4486
#46 0x000000000048d7ba in BackendRun (port=<optimized out>, port=<optimized
out>) at postmaster.c:4530
#47 BackendStartup (port=0x2ba6060) at postmaster.c:4252
#48 ServerLoop () at postmaster.c:1745
#49 0x0000000000735692 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x2b75280) at postmaster.c:1417
#50 0x000000000048e8d2 in main (argc=3, argv=0x2b75280) at main.c:209

A small isolated test case is not available so far, the issue occurs on
select involving 450GiB table. Currently, production DB is being restored to
a separated machine to allow future testing/investigation without affecting
other database users.

Thanks for any advice on working around this issue or how to collect better
diagnostic data / additional helpful information.

Kind regards Ales Zeleny


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kirk Parker
Дата:
Сообщение: Re: BUG #17511: Inconsistent permissions on some information_schema tables
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17511: Inconsistent permissions on some information_schema tables