Re: Parallel Full Hash Join

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Parallel Full Hash Join
Дата
Msg-id CAAKRu_Yvpd08MvXO3yeevmXeqykw+Fbh02VVefWuvqKyg6j8iQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Full Hash Join  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Parallel Full Hash Join  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Sat, Apr 8, 2023 at 1:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Sat, Apr 8, 2023 at 12:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > I committed the main patch.
> >
> > BTW, it was easy to miss in all the buildfarm noise from
> > last-possible-minute patches, but chimaera just showed something
> > that looks like a bug in this code [1]:
> >
> > 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG:  statement: savepoint settings;
> > 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG:  statement: set local
max_parallel_workers_per_gather= 2; 
> > 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG:  statement: explain (costs off)
> >              select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
> > 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG:  statement: select  count(*) from simple r full
outerjoin simple s on (r.id = 0 - s.id); 
> > TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), File: "nodeHash.c", Line: 2118, PID: 19147

So, after staring at this for awhile, I suspect this assertion is just
plain wrong. BarrierArriveAndDetachExceptLast() contains this code:

    if (barrier->participants > 1)
    {
        --barrier->participants;
        SpinLockRelease(&barrier->mutex);

        return false;
    }
    Assert(barrier->participants == 1);

So in between this assertion and the one we tripped,

    if (!BarrierArriveAndDetachExceptLast(&batch->batch_barrier))
    {
    ...
        return false;
    }

    /* Now we are alone with this batch. */
    Assert(BarrierPhase(&batch->batch_barrier) == PHJ_BATCH_SCAN);
    Assert(BarrierParticipants(&batch->batch_barrier) == 1);

Another worker attached to the batch barrier, saw that it was in
PHJ_BATCH_SCAN, marked it done and detached. This is fine.
BarrierArriveAndDetachExceptLast() is meant to ensure no one waits
(deadlock hazard) and that at least one worker stays to do the unmatched
scan. It doesn't hurt anything for another worker to join and find out
there is no work to do.

We should simply delete this assertion.

- Melanie



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Direct I/O
Следующее
От: Tom Lane
Дата:
Сообщение: Re: longfin missing gssapi_ext.h