Re: Extracting only the columns needed for a query

Поиск

Список

Период

Сортировка

От	Melanie Plageman
Тема	Re: Extracting only the columns needed for a query
Дата	19 февраля 2020 г. 02:26:16
Msg-id	CAAKRu_bZXCeaq2n2St3Z5Vod50MYMizP14x6A7mcR3EKaoiu=Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Extracting only the columns needed for a query (Melanie Plageman <melanieplageman@gmail.com>)
Ответы	Re: Extracting only the columns needed for a query (Dmitry Dolgov <9erthalion6@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jan 31, 2020 at 9:52 AM Melanie Plageman <melanieplageman@gmail.com> wrote:

I'm bumping this to the next commitfest because I haven't had a chance
to address the questions posed by Dmitry. I'm sure I'll get to it in
the next few weeks.

> I believe it would be beneficial to add this potential API extension patch into
> the thread (as an example of an interface defining how scanCols could be used)
> and review them together.

As for including some code that uses the scanCols, after discussing
off-list with a few folks, there are three options I would like to
pursue for doing this.

One option I will pursue is using the scanCols to inform the columns
needed to be spilled for memory-bounded hashagg (mentioned by Jeff
here [1]).

The third is exercising it with a test only but providing an example
of how a table AM API user like Zedstore uses the columns during
planning.

Outside of the use case that Pengzhou has provided in [1], we started
looking into using scanCols for extracting the subset of columns
needed in two cases:

1) columns required to be spilled for memory-bounded hashagg
2) referenced CTE columns which must be materialized into tuplestore

However, implementing these optimization with the scanCols patch
wouldn't work with its current implementation.

The scanCols are extracted from PlannerInfo->simple_rel_array and
PlannerInfo->simple_rte_array, at which point, we have no way of
knowing if the column was aggregated or if it was part of a CTE or
anything else about how it is used in the query.

We could solve this by creating multiple bitmaps at the time that we
create the scanCols field -- one for aggregated columns, one for
unaggregated columns, one for CTEs. However, that seems like it would
add a lot of extra complexity to the common code path during planning.

Basically, scanCols are simply columns that need to be scanned. It is
probably okay if it is only used by table access method API users, as
Pengzhou's patch illustrates.

Given that we have addressed the feedback about showing a use case,
this patch is probably ready for a once over from Dmitry again. (It is
registered for the March fest).

[1] https://www.postgresql.org/message-id/CAG4reAQc9vYdmQXh%3D1D789x8XJ%3DgEkV%2BE%2BfT9%2Bs9tOWDXX3L9Q%40mail.gmail.com

Melanie Plageman

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michail Nikolaev
Дата: 19 февраля 2020 г., 02:01:38
Сообщение: Re: BUG #16108: Colorization to the output of command-line hasunproperly behaviors at Windows platform

Следующее

От: Melanie Plageman
Дата: 19 февраля 2020 г., 02:31:22
Сообщение: Re: Memory-Bounded Hash Aggregation

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Extracting only the columns needed for a query

Предыдущее

Следующее