Обсуждение: [HACKERS] Tuple sort is broken. It crashes on simple test.

Поиск
Список
Период
Сортировка

[HACKERS] Tuple sort is broken. It crashes on simple test.

От
Mithun Cy
Дата:
Test:
------
create table seq_tab(t int);
insert into seq_tab select generate_series(1, 10000000);
select count(distinct t) from seq_tab;

#0  0x000000000094a9ad in pfree (pointer=0x0) at mcxt.c:1007
#1  0x0000000000953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803
#2  0x0000000000953824 in mergeruns (state=0x1611450) at tuplesort.c:2721
#3  0x00000000009521bc in tuplesort_performsort (state=0x1611450) at
tuplesort.c:1813
#4  0x0000000000662b85 in process_ordered_aggregate_single
(aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at
nodeAgg.c:1178
#5  0x00000000006636e0 in finalize_aggregates (aggstate=0x160b540,
peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600
#6  0x0000000000664509 in agg_retrieve_direct (aggstate=0x160b540) at
nodeAgg.c:2266
#7  0x0000000000663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946
#8  0x0000000000650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503

Git bisect shows me the following commit has caused it.

commit Id e94568ecc10f2638e542ae34f2990b821bbf90ac

Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>  2016-10-03 16:07:49
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi>  2016-10-03 16:07:49

Change the way pre-reading in external sort's merge phase works.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

От
Michael Paquier
Дата:
(Adding Peter in CC who also worked on that)

On Mon, Jan 16, 2017 at 7:14 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Test:
> ------
> create table seq_tab(t int);
> insert into seq_tab select generate_series(1, 10000000);
> select count(distinct t) from seq_tab;
>
> #0  0x000000000094a9ad in pfree (pointer=0x0) at mcxt.c:1007
> #1  0x0000000000953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803
> #2  0x0000000000953824 in mergeruns (state=0x1611450) at tuplesort.c:2721
> #3  0x00000000009521bc in tuplesort_performsort (state=0x1611450) at
> tuplesort.c:1813
> #4  0x0000000000662b85 in process_ordered_aggregate_single
> (aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at
> nodeAgg.c:1178
> #5  0x00000000006636e0 in finalize_aggregates (aggstate=0x160b540,
> peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600
> #6  0x0000000000664509 in agg_retrieve_direct (aggstate=0x160b540) at
> nodeAgg.c:2266
> #7  0x0000000000663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946
> #8  0x0000000000650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503

Indeed. It crashes for me immediately by adding an ORDER BY:
select count(distinct t) from seq_tab order by 1;
-- 
Michael



Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

От
Peter Geoghegan
Дата:
On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Indeed. It crashes for me immediately by adding an ORDER BY:
> select count(distinct t) from seq_tab order by 1;

The problem was that one particular call to the macro
RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
already have such a check.

Attached patch fixes the bug.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

От
Pavel Stehule
Дата:


2017-01-16 19:24 GMT+01:00 Peter Geoghegan <pg@heroku.com>:
On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Indeed. It crashes for me immediately by adding an ORDER BY:
> select count(distinct t) from seq_tab order by 1;

The problem was that one particular call to the macro
RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
already have such a check.

Attached patch fixes the bug.


Should not be enhanced regress tests too?

Regards

Pavel
 
--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

От
Peter Geoghegan
Дата:
On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Should not be enhanced regress tests too?

We already have coverage of multi-pass external tuplesorts, as of a
few months back. That didn't catch this bug only because it was a
pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call
does have line coverage already.

I wouldn't object to adding a test case that would have exercised this
bug, too. It took me a while to talk Tom into the test that was added
several months back, which discouraged me from adding another test
case here. (There were concerns about the overhead of an external sort
test on slower buildfarm animals.)

-- 
Peter Geoghegan



Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

От
Tom Lane
Дата:
Peter Geoghegan <pg@heroku.com> writes:
> The problem was that one particular call to the macro
> RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
> by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
> already have such a check.

> Attached patch fixes the bug.

Pushed, thanks.
        regards, tom lane



Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

От
Pavel Stehule
Дата:


2017-01-16 19:43 GMT+01:00 Peter Geoghegan <pg@heroku.com>:
On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Should not be enhanced regress tests too?

We already have coverage of multi-pass external tuplesorts, as of a
few months back. That didn't catch this bug only because it was a
pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call
does have line coverage already.

I wouldn't object to adding a test case that would have exercised this
bug, too. It took me a while to talk Tom into the test that was added
several months back, which discouraged me from adding another test
case here. (There were concerns about the overhead of an external sort
test on slower buildfarm animals.)

ok

 

--
Peter Geoghegan