Re: [HACKERS] Parallel Append implementation

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Parallel Append implementation
Дата
Msg-id CA+TgmoaNXzX8eibigFhSU=TbEB+JEp=BTdQAqRHDRWQmq9vKbQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel Append implementation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: [HACKERS] Parallel Append implementation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Список pgsql-hackers
On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>>
>> +        if (rel->partial_pathlist != NIL &&
>> +            (Path *) linitial(rel->partial_pathlist) == subpath)
>> +            partial_subplans_set = bms_add_member(partial_subplans_set, i);
>>
>> This seems like a scary way to figure this out.  What if we wanted to
>> build a parallel append subpath with some path other than the
>> cheapest, for some reason?  I think you ought to record the decision
>> that set_append_rel_pathlist makes about whether to use a partial path
>> or a parallel-safe path, and then just copy it over here.
>
> I agree that assuming that a subpath is non-partial path if it's not
> cheapest of the partial paths is risky. In fact, we can not assume
> that even when it's not one of the partial_paths since it could have
> been kicked out or was never added to the partial path list like
> reparameterized path. But if we have to save the information about
> which of the subpaths are partial paths and which are not in
> AppendPath, it would take some memory, noticeable for thousands of
> partitions, which will leak if the path doesn't make into the
> rel->pathlist.

True, but that's no different from the situation for any other Path
node that has substructure.  For example, an IndexPath has no fewer
than 5 list pointers in it.  Generally we assume that the number of
paths won't be large enough for the memory used to really matter, and
I think that will also be true here.  And an AppendPath has a list of
subpaths, and if I'm not mistaken, those list nodes consume more
memory than the tracking information we're thinking about here will.

I think you're thinking about this issue because you've been working
on partitionwise join where memory consumption is a big issue, but
there are a lot of cases where that isn't really a big deal.

> The purpose of that information is to make sure that we
> allocate only one worker to that plan. I suggested that we use
> path->parallel_workers for the same, but it seems that's not
> guaranteed to be reliable. The reasons were discussed upthread. Is
> there any way to infer whether we can allocate more than one workers
> to a plan by looking at the corresponding path?

I think it would be smarter to track it some other way.  Either keep
two lists of paths, one of which is the partial paths and the other of
which is the parallel-safe paths, or keep a bitmapset indicating which
paths fall into which category.  I am not going to say there's no way
we could make it work without either of those things -- looking at the
parallel_workers flag might be made to work, for example -- but the
design idea I had in mind when I put this stuff into place was that
you keep them separate in other ways, not by the data they store
inside them.  I think it will be more robust if we keep to that
principle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Gather Merge
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')