Обсуждение: array_agg(DISTINCT) caused a segmentation fault

Поиск
Список
Период
Сортировка

array_agg(DISTINCT) caused a segmentation fault

От
Fujii Masao
Дата:
Hi,

In the current master branch, with enable_presorted_aggregate = on,
I got a segmentation fault when executing the following query.
OTOH, the query didn't cause a segmentation fault
when enable_presorted_aggregate was disabled.

=# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge;

LOG:  server process (PID 76507) was terminated by signal 11: Segmentation fault: 11
DETAIL:  Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1,
2))hoge;
 


The backtrace extracted from the core file is;

* thread #1
   * frame #0: 0x000000010815807d postgres`toast_raw_datum_size(value=0) at detoast.c:550:6
     frame #1: 0x000000010891d31b postgres`texteq(fcinfo=0x00007ff7b7dc06b8) at varlena.c:1804:10
     frame #2: 0x0000000108975607 postgres`FunctionCall2Coll(flinfo=0x00007fd46900e0d8, collation=100, arg1=0, arg2=0)
atfmgr.c:1148:11
 
     frame #3: 0x000000010846bee0 postgres`ExecEvalPreOrderedDistinctSingle(aggstate=0x00007fd46900c548,
pertrans=0x00007fd46900dff0)at execExprInterp.c:4253:17
 
     frame #4: 0x00000001084668a6 postgres`ExecInterpExpr(state=0x00007fd46908cab8, econtext=0x00007fd46900c970,
isnull=0x00007ff7b7dc09d7)at execExprInterp.c:1772:8
 
     frame #5: 0x000000010849804b postgres`ExecEvalExprSwitchContext(state=0x00007fd46908cab8,
econtext=0x00007fd46900c970,isNull=0x00007ff7b7dc09d7) at executor.h:344:13
 
     frame #6: 0x00000001084974ff postgres`advance_aggregates(aggstate=0x00007fd46900c548) at nodeAgg.c:823:2
     frame #7: 0x0000000108496ff1 postgres`agg_retrieve_direct(aggstate=0x00007fd46900c548) at nodeAgg.c:2446:6
     frame #8: 0x000000010849428b postgres`ExecAgg(pstate=0x00007fd46900c548) at nodeAgg.c:2171:14
     frame #9: 0x0000000108480502 postgres`ExecProcNodeFirst(node=0x00007fd46900c548) at execProcnode.c:464:9
     frame #10: 0x0000000108477f42 postgres`ExecProcNode(node=0x00007fd46900c548) at executor.h:262:9
     frame #11: 0x0000000108473351 postgres`ExecutePlan(estate=0x00007fd46900c318, planstate=0x00007fd46900c548,
use_parallel_mode=false,operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection,
dest=0x00007fd46908a4e0,execute_once=true) at execMain.c:1633:10
 
     frame #12: 0x000000010847320b postgres`standard_ExecutorRun(queryDesc=0x00007fd469009318,
direction=ForwardScanDirection,count=0, execute_once=true) at execMain.c:364:3
 
     frame #13: 0x0000000108472fc2 postgres`ExecutorRun(queryDesc=0x00007fd469009318, direction=ForwardScanDirection,
count=0,execute_once=true) at execMain.c:308:3
 
     frame #14: 0x0000000108752794 postgres`PortalRunSelect(portal=0x00007fd469031718, forward=true, count=0,
dest=0x00007fd46908a4e0)at pquery.c:924:4
 
     frame #15: 0x0000000108752179 postgres`PortalRun(portal=0x00007fd469031718, count=9223372036854775807,
isTopLevel=true,run_once=true, dest=0x00007fd46908a4e0, altdest=0x00007fd46908a4e0, qc=0x00007ff7b7dc0df0) at
pquery.c:768:18
     frame #16: 0x000000010874d5a2 postgres`exec_simple_query(query_string="SELECT array_agg(distinct val) FROM (SELECT
NULLAS val FROM generate_series(1, 2)) hoge;") at postgres.c:1237:10
 
     frame #17: 0x000000010874c6de postgres`PostgresMain(dbname="postgres", username="postgres") at postgres.c:4565:7
     frame #18: 0x000000010865c7c2 postgres`BackendRun(port=0x00007fd468404080) at postmaster.c:4461:2
     frame #19: 0x000000010865a09c postgres`BackendStartup(port=0x00007fd468404080) at postmaster.c:4189:3
     frame #20: 0x0000000108657a7e postgres`ServerLoop at postmaster.c:1779:6
     frame #21: 0x00000001086566d0 postgres`PostmasterMain(argc=3, argv=0x0000600001635260) at postmaster.c:1463:11
     frame #22: 0x0000000108506b27 postgres`main(argc=3, argv=0x0000600001635260) at main.c:200:3
     frame #23: 0x000000011202552e dyld`start + 462

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: array_agg(DISTINCT) caused a segmentation fault

От
David Rowley
Дата:
On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> =# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge;
>
> LOG:  server process (PID 76507) was terminated by signal 11: Segmentation fault: 11

Thanks for the report. Looks like mine as there's no crash with: set
enable_presorted_aggregate=0;

David



Re: array_agg(DISTINCT) caused a segmentation fault

От
David Rowley
Дата:
On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> =# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge;
> LOG:  server process (PID 76507) was terminated by signal 11: Segmentation fault: 11
> DETAIL:  Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1,
2))hoge;
 

This was a fairly trivial logic bug in
ExecEvalPreOrderedDistinctSingle(). The JIT code calls that same
function and ExecEvalPreOrderedDistinctMulti() uses the standard
expression evaluation logic.  So looks like the problem is just
isolated to ExecEvalPreOrderedDistinctSingle().

I've now pushed a fix for it and included your test.  To get it to
crash it needed to be a byref aggregate without a strict transition
function.  There are not too many of those, which is probably why
nobody noticed this before.

David



Re: array_agg(DISTINCT) caused a segmentation fault

От
Fujii Masao
Дата:

On 2023/02/13 16:44, David Rowley wrote:
> On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> =# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge;
>> LOG:  server process (PID 76507) was terminated by signal 11: Segmentation fault: 11
>> DETAIL:  Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1,
2))hoge;
 
> 
> This was a fairly trivial logic bug in
> ExecEvalPreOrderedDistinctSingle(). The JIT code calls that same
> function and ExecEvalPreOrderedDistinctMulti() uses the standard
> expression evaluation logic.  So looks like the problem is just
> isolated to ExecEvalPreOrderedDistinctSingle().
> 
> I've now pushed a fix for it and included your test.  To get it to
> crash it needed to be a byref aggregate without a strict transition
> function.  There are not too many of those, which is probably why
> nobody noticed this before.

Thanks for the fix!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: array_agg(DISTINCT) caused a segmentation fault

От
Alexander Lakhin
Дата:
Hello David,

13.02.2023 10:44, David Rowley wrote:
On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
=# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge;
LOG:  server process (PID 76507) was terminated by signal 11: Segmentation fault: 11
DETAIL:  Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge;
I've now pushed a fix for it and included your test.  To get it to
crash it needed to be a byref aggregate without a strict transition
function.  There are not too many of those, which is probably why
nobody noticed this before.
I've encountered an issue that could have the same title but it still reproduced after the fix.
The following query:
SELECT array_agg(DISTINCT a ORDER BY a DESC)
          FROM (VALUES (1),(1.0),(NULL)) v(a);

Invokes valgrind-detected error:
==00:00:00:03.708 2686358== Invalid read of size 4
==00:00:00:03.708 2686358==    at 0x76C4AE: GetMemoryChunkMethodID (mcxt.c:195)
==00:00:00:03.708 2686358==    by 0x76C4AE: pfree (mcxt.c:1439)
==00:00:00:03.708 2686358==    by 0x3FD547: ExecEvalPreOrderedDistinctSingle (execExprInterp.c:4258)
==00:00:00:03.708 2686358==    by 0x3FF203: ExecInterpExpr (execExprInterp.c:1772)
==00:00:00:03.708 2686358==    by 0x418792: ExecEvalExprSwitchContext (executor.h:344)
==00:00:00:03.708 2686358==    by 0x418792: advance_aggregates (nodeAgg.c:823)
==00:00:00:03.708 2686358==    by 0x41A12A: agg_retrieve_direct (nodeAgg.c:2446)
==00:00:00:03.708 2686358==    by 0x41A294: ExecAgg (nodeAgg.c:2171)
==00:00:00:03.708 2686358==    by 0x40AD3F: ExecProcNodeFirst (execProcnode.c:464)
==00:00:00:03.708 2686358==    by 0x40337F: ExecProcNode (executor.h:262)
==00:00:00:03.708 2686358==    by 0x40337F: ExecutePlan (execMain.c:1633)
==00:00:00:03.708 2686358==    by 0x403542: standard_ExecutorRun (execMain.c:364)
==00:00:00:03.708 2686358==    by 0x40360E: ExecutorRun (execMain.c:308)
==00:00:00:03.708 2686358==    by 0x5EB971: PortalRunSelect (pquery.c:924)
==00:00:00:03.708 2686358==    by 0x5ED31B: PortalRun (pquery.c:768)
==00:00:00:03.708 2686358==  Address 0xfffffffffffffff8 is not stack'd, malloc'd or (recently) free'd
==00:00:00:03.708 2686358==
...
==00:00:00:03.708 2686358==
==00:00:00:03.708 2686358== Exit program on first error (--exit-on-first-error=yes)
2023-02-13 10:26:39.276 MSK [2686332] LOG:  server process (PID 2686358) exited with exit code 1
2023-02-13 10:26:39.276 MSK [2686332] DETAIL:  Failed process was running: SELECT array_agg(DISTINCT a ORDER BY a DESC)
              FROM (VALUES (1),(1.0),(NULL)) v(a);

(Without valgrind I get SIGSEGV here.)
The first bad commit is 1349d2790 again (but before 80ef92675 an assertion failure can be seen).

Best regards,
Alexander

Re: array_agg(DISTINCT) caused a segmentation fault

От
David Rowley
Дата:
On Mon, 13 Feb 2023 at 23:00, Alexander Lakhin <exclusion@gmail.com> wrote:
> I've encountered an issue that could have the same title but it still reproduced after the fix.
> The following query:
> SELECT array_agg(DISTINCT a ORDER BY a DESC)
>           FROM (VALUES (1),(1.0),(NULL)) v(a);

Thanks for testing that.  I neglected to update the logic which pfrees
the old Datum, which (as of 7da51590e) may now be NULL.

I've just pushed a fix.

David



Re: array_agg(DISTINCT) caused a segmentation fault

От
Alexander Lakhin
Дата:
13.02.2023 13:41, David Rowley wrote:
> On Mon, 13 Feb 2023 at 23:00, Alexander Lakhin <exclusion@gmail.com> wrote:
> ...
> Thanks for testing that.  I neglected to update the logic which pfrees
> the old Datum, which (as of 7da51590e) may now be NULL.
>
> I've just pushed a fix.
Thanks! The issue is not reproduced now.

Best regards,
Alexander