Обсуждение: BUG #17725: Sefault when seg_in() called with a large argument
The following bug has been logged on the website: Bug reference: 17725 Logged by: Robins Tharakan Email address: tharakan@gmail.com PostgreSQL version: 15.1 Operating system: Ubuntu 20.04 Description: Hi, The following SQL Segfaults on master (tested on b3bb7d12af). SQL: SELECT seg_in(numeric_out(round(31, 10000))) Backtrace on ea5ae4cae6@REL_14_STABLE: ===================================== #0 __strcpy_avx2 () at ../sysdeps/x86_64/multiarch/strcpy-avx2.S:578 #1 0x00007f31c421f4aa in restore ( result=0x55009893ace0 <error: Cannot access memory at address 0x55009893ace0>, val=31, n=-46) at seg.c:1009 #2 0x00007f31c421dab9 in seg_out (fcinfo=0x7ffe3ddff6c0) at seg.c:135 #3 0x000055d296a40aa9 in FunctionCall1Coll (flinfo=0x55d298735478, collation=0, arg1=94362989160448) at fmgr.c:1138 #4 0x000055d296a42004 in OutputFunctionCall (flinfo=0x55d298735478, val=94362989160448) at fmgr.c:1575 #5 0x000055d29634a8b4 in printtup (slot=0x55d2987344b8, self=0x55d298936cc0) at printtup.c:357 #6 0x000055d2966196c6 in ExecutePlan (estate=0x55d298733f80, planstate=0x55d2987341b8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x55d298936cc0, execute_once=true) at execMain.c:1582 #7 0x000055d2966172fd in standard_ExecutorRun (queryDesc=0x55d2987289d0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361 #8 0x00007f31dbea134d in pgss_ExecutorRun (queryDesc=0x55d2987289d0, direction=ForwardScanDirection, count=0, execute_once=true) at pg_stat_statements.c:1003 #9 0x000055d2966170f3 in ExecutorRun (queryDesc=0x55d2987289d0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:303 Backtrace Full excerpt: ====================== #0 __strcpy_avx2 () at ../sysdeps/x86_64/multiarch/strcpy-avx2.S:578 No locals. #1 0x00007f31c421f4aa in restore ( result=0x55009893ace0 <error: Cannot access memory at address 0x55009893ace0>, val=31, n=-46) at seg.c:1009 buf = "00000000003e1\000\060\060\060\060\060\060\060\060\060\060" p = 0x55d29893ace8 "e+01" exp = 48 i = 17 dp = 11 sign = 0 #2 0x00007f31c421dab9 in seg_out (fcinfo=0x7ffe3ddff6c0) at seg.c:135 seg = 0x55d29872e800 result = 0x55d29893ace0 "3.100000e+01" p = 0x55d29893ace0 "3.100000e+01" #3 0x000055d296a40aa9 in FunctionCall1Coll (flinfo=0x55d298735478, collation=0, arg1=94362989160448) at fmgr.c:1138 fcinfodata = {fcinfo = {flinfo = 0x55d298735478, context = 0x0, resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 1, args = 0x7ffe3ddff6e0}, fcinfo_data = "xTs\230\322U", '\000' <repeats 23 times>, "U\001\000\000\350r\230\322U\000\000\000m\223\230\322U\000"} fcinfo = 0x7ffe3ddff6c0 result = 94362958816336 __func__ = "FunctionCall1Coll" #4 0x000055d296a42004 in OutputFunctionCall (flinfo=0x55d298735478, val=94362989160448) at fmgr.c:1575 No locals. #5 0x000055d29634a8b4 in printtup (slot=0x55d2987344b8, self=0x55d298936cc0) at printtup.c:357 outputstr = 0x55d296882235 <check_stack_depth+13> "\204\300td\276" thisState = 0x55d298735468 attr = 94362989160448 typeinfo = 0x55d2987343a0 myState = 0x55d298936cc0 oldcontext = 0x55d298733e60 buf = 0x55d298936d10 natts = 1 i = 0 Error Log: ========= 2022-12-20 02:44:43.728 UTC [633388] LOG: server process (PID 783919) was terminated by signal 11: Segmentation fault 2022-12-20 02:44:43.728 UTC [633388] DETAIL: Failed process was running: SELECT seg_in(numeric_out(round(31,1000000))); 2022-12-20 02:44:43.728 UTC [633388] LOG: terminating any other active server processes Thanks to SQLSmith / SQLReduce for helping with the find. - Robins Tharakan Amazon Web Services
On Tue, Dec 20, 2022 at 4:28 PM PG Bug reporting form <noreply@postgresql.org> wrote:
> PostgreSQL version: 15.1
> The following SQL Segfaults on master (tested on b3bb7d12af).
> Backtrace on ea5ae4cae6@REL_14_STABLE:
> SQL: SELECT seg_in(numeric_out(round(31, 10000)))
> 2022-12-20 02:44:43.728 UTC [633388] DETAIL: Failed process was running:
> SELECT seg_in(numeric_out(round(31,1000000)));
Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure
=# SELECT seg_in(numeric_out(round(31, 10000)));
seg_in
--------
3e1
(1 row)
=# SELECT seg_in(numeric_out(round(31,1000000)));
seg_in
--------
3e1
(1 row)
It's possibly relevant that this result is different from the "3.100000e+01" which was shown in your backtrace. Since a few details of this report don't agree with each other, I'm starting to wonder if some other relevant details got lost along the way.
--
John Naylor
EDB: http://www.enterprisedb.com
> PostgreSQL version: 15.1
> The following SQL Segfaults on master (tested on b3bb7d12af).
> Backtrace on ea5ae4cae6@REL_14_STABLE:
> SQL: SELECT seg_in(numeric_out(round(31, 10000)))
> 2022-12-20 02:44:43.728 UTC [633388] DETAIL: Failed process was running:
> SELECT seg_in(numeric_out(round(31,1000000)));
Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure
=# SELECT seg_in(numeric_out(round(31, 10000)));
seg_in
--------
3e1
(1 row)
=# SELECT seg_in(numeric_out(round(31,1000000)));
seg_in
--------
3e1
(1 row)
It's possibly relevant that this result is different from the "3.100000e+01" which was shown in your backtrace. Since a few details of this report don't agree with each other, I'm starting to wonder if some other relevant details got lost along the way.
--
John Naylor
EDB: http://www.enterprisedb.com
Hi John, On Tue, 20 Dec 2022 at 20:44, John Naylor <john.naylor@enterprisedb.com> wrote: > Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure > It's possibly relevant that this result is different from the "3.100000e+01" which was shown in your backtrace. Since afew details of this report don't agree with each other, I'm starting to wonder if some other relevant details got lost alongthe way. Thanks for taking a look and you're possibly correct. After trying a few combinations, I see that passing CFLAGS="-Wuninitialized" (default for my test setup) causes this failure. Removing the flag gives the error you mention, and possibly why this may not be easy to reproduce on a production system (unsure). $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 # How I trigger compilation cd ${sourcepth} && git clean -xdf && ./configure CFLAGS="-Wuninitialized" --prefix=${installpth} && make -j`nproc` install ... This is a recent crash on 69f75bf825@REL_12_STABLE 2022-12-20 10:24:53.361 UTC [3087004] LOG: server process (PID 3182365) was terminated by signal 11: Segmentation fault 2022-12-20 10:24:53.361 UTC [3087004] DETAIL: Failed process was running: SELECT seg_in(numeric_out(round(31, 10000))); 2022-12-20 10:24:53.361 UTC [3087004] LOG: terminating any other active server processes 2022-12-20 10:24:53.366 UTC [3087004] LOG: all server processes terminated; reinitializing I created this bug-report since I am able to reproduce this at will. But let me know if this is uninteresting, or if I can provide any other detail to help in triaging. - robins
Robins Tharakan <tharakan@gmail.com> writes: > On Tue, 20 Dec 2022 at 20:44, John Naylor <john.naylor@enterprisedb.com> wrote: >> Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure > After trying a few combinations, I see that passing > CFLAGS="-Wuninitialized" (default for my test setup) causes this failure. > Removing the flag gives the error you mention, and possibly why this > may not be easy to reproduce on a production system (unsure). I don't see a crash either, but I can't help observing that this input leads to a "seg" struct with "-46" significant digits: (gdb) p *seg $3 = {lower = 31, upper = 31, l_sigd = -46 '\322', u_sigd = -46 '\322', l_ext = 0 '\000', u_ext = 0 '\000'} So we're invoking sprintf with a fairly insane precision spec: 939 sprintf(result, "%.*e", n - 1, val); (gdb) p n $4 = -46 (gdb) p val $5 = 31 POSIX says "a negative precision is taken as if the precision were omitted", and our code seems to do that, but I wonder if this is managing to overrun the output buffer on your platform. IMO: 1. The seg grammar needs to constrain the result of significant_digits() to something that will fit in the allocated "char" field width. It looks like some code paths there have clamps, but not all. 2. Because we might already have stored "seg" values with bogus sigd values, restore() had better clamp the "n" value it's given to something sane. I see it clamps large positive values, but it's not worrying about zero-or-negative. regards, tom lane
I wrote: > I don't see a crash either, but I can't help observing that this > input leads to a "seg" struct with "-46" significant digits: > ... > So we're invoking sprintf with a fairly insane precision spec: Actually, it looks like sprintf is not the problem. This is: (gdb) 984 buf[10 + n] = '\0'; (gdb) p n $9 = -46 So first off, we're stomping on something we shouldn't, and secondly we're failing to nul-terminate buf[], which easily explains your observed crash at the strcpy a little further down. On most platforms strcpy would find a nul byte not too much further on, which might prevent the worst sorts of damage, but this is still very ugly. regards, tom lane