Re: Parallel Sequence Scan doubts

Поиск
Список
Период
Сортировка
От Haribabu Kommi
Тема Re: Parallel Sequence Scan doubts
Дата
Msg-id CAJrrPGcPU4zbbndv61Bidmc67zSkOsOFgWw-Eh7BF2Zd=HCzdg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Sequence Scan doubts  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: Parallel Sequence Scan doubts  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On Sun, Aug 24, 2014 at 1:11 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 08/21/2014 02:47 PM, Haribabu Kommi wrote:
>> Implementation of "Parallel Sequence Scan"
>
> I think you mean "Parallel Sequential Scan".

Sorry for not being clear, Yes it is parallel sequential scan.


>> 1."Parallel Sequence Scan" can achieved by using the background
>> workers doing the job of actual sequence scan including the
>> qualification check also.
>
> Only if the qualifiers are stable/immutable I think.
>
> Not even necessarily stable functions - consider use of the
> fmgr/executor state contexts to carry information over between calls,
> and what parallel execution would do to that.

Thanks for your input. As of now we are targeting only immutable functions,
that can be executed in parallel without any side effects.

>> 3. In the executor Init phase, Try to copy the necessary data required
>> by the workers and start the workers.
>
> Copy how?
>
> Back-ends can only communicate with each other over shared memory,
> signals, and using sockets.

Sorry for not being clear, copying those data structures into dynamic
shared memory only.
From there the workers can access.

>> 4. In the executor run phase, just get the tuples which are sent by
>> the workers and process them further in the plan node execution.
>
> Again, how do you propose to copy these back to the main bgworker?

With the help of message queues that are created in the dynamic shared memory,
the workers can send the data to the queue. On other side the main
backend receives
the tuples from the queue.


>> 1. Data structures that are required to be copied from backend to
>> worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate
>> and etc.
>
> That's a big "etc". Huge, in fact.
>
> Any function can reference any global variable. Even an immutable
> function might use globals for cache etc - and might, for example, set
> them up using an executor start hook. You cannot really make any
> assumptions about what functions access what memory.

Yes you are correct. For that reason only I am thinking of Supporting
of functions
that only dependent on input variables and are not modifying any global data.

>> I see some problems in copying "Estate" data structure into the shared
>> memory because it contains so many pointers. There is a need of some
>> infrastructure to copy these data structures into the shared memory.
>
> It's not just a matter of copying them into/via shmem.
>
> It's about their meaning. Does it even make sense to copy the executor
> state to another backend? If so, you can't copy it back, so what do you
> do at the end of the scans?

If we handle the locking of relation in the backend and avoid doing
the parallel sequential scan
if any sub query is involved, then there is no need of full estate in
the worker.
In those cases by sharing less information, I think we can execute the
plan in the worker.

>> Any suggestions?
>
> Before you try to design anything more on this, study the *large* amount
> of discussion that has happened on this topic on this mailing list over
> the last years.
>
> This is not a simple or easy task, and it's not one you should approach
> without studying what's already been worked on, built, contemplated, etc.

Thanks for your information.

Regards,
Hari Babu
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David G Johnston
Дата:
Сообщение: Re: proposal: rounding up time value less than its unit.
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Parallel Sequence Scan doubts