Re: [HACKERS] Small improvement to parallel query docs

Поиск

Список

Период

Сортировка

От	Brad DeJong
Тема	Re: [HACKERS] Small improvement to parallel query docs
Дата	14 февраля 2017 г. 03:10:28
Msg-id	CY1PR0201MB189707298B119036F40CD9D2FF590@CY1PR0201MB1897.namprd02.prod.outlook.com обсуждение исходный текст
Ответ на	Re: [HACKERS] Small improvement to parallel query docs (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: [HACKERS] Small improvement to parallel query docs (David Rowley <david.rowley@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

Robert Haas wrote:

> +    <literal>COUNT(*)</>, each worker must compute subtotals which later must
> +    be combined to produce an overall total in order to produce the final
> +    answer.  If the query involves a <literal>GROUP BY</> clause,
> +    separate subtotals must be computed for each group seen by each parallel
> +    worker. Each of these subtotals must then be combined into an overall
> +    total for each group once the parallel aggregate portion of the plan is
> +    complete.  This means that queries which produce a low number of groups
> +    relative to the number of input rows are often far more attractive to the
> +    query planner, whereas queries which don't collect many rows into each
> +    group are less attractive, due to the overhead of having to combine the
> +    subtotals into totals, of which cannot run in parallel.

> I don't think "of which cannot run in parallel" is good grammar.  I'm somewhat unsure whether the rest is an
improvementor not.  Other opinions?
 

Does this read any more clearly?

+    <literal>COUNT(*)</>, each worker must compute subtotals which are later
+    combined in order to produce an overall total for the final answer.  If
+    the query involves a <literal>GROUP BY</> clause, separate subtotals
+    must be computed for each group seen by each parallel worker.  After the
+    parallel aggregate portion of the plan is complete, there is a serial step
+    where the group subtotals from all of the parallel workers are combined
+    into an overall total for each group.  Because of the overhead of combining
+    the subtotals into totals, plans which produce few groups relative to the
+    number of input rows are often more attractive to the query planner
+    than plans which produce many groups relative to the number of input rows.


I got rid of the ", of which cannot run in parallel" entirely. It was
already stated that the subtotals->totals step runs "once the parallel
aggregate portion of the plan is complete." which implies that it is serial.
I made that explicit with "there is a serial step". Also, the purpose of the 
", of which cannot run in parallel" sentence is to communicate why the
planner prefers one plan over another and, if I'm reading this correctly,
the subtotals->totals step is serial for both plans so that is not a reason
to prefer one over the other.

I think that the planner prefers plans rather than queries, so I changed that as well.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Merlin Moncure
Дата: 14 февраля 2017 г., 03:08:47
Сообщение: Re: [HACKERS] libpq Alternate Row Processor

Следующее

От: Corey Huinker
Дата: 14 февраля 2017 г., 03:12:55
Сообщение: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands:\quit_if, \quit_unless)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Small improvement to parallel query docs

Предыдущее

Следующее