Re: Wrong results from Parallel Hash Full Join

Поиск

Список

Период

Сортировка

От	Melanie Plageman
Тема	Re: Wrong results from Parallel Hash Full Join
Дата	20 апреля 2023 г. 03:43:15
Msg-id	CAAKRu_bdwDN_aHVctHcc9VoDP9av7LUMeuLbch1fHD2ESouw1g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Wrong results from Parallel Hash Full Join (Andres Freund <andres@anarazel.de>)
Ответы	Re: Wrong results from Parallel Hash Full Join (Melanie Plageman <melanieplageman@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Apr 19, 2023 at 3:20 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote:
> On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote:
> > Ultimately this is probably fine. If we wanted to modify one of the
> > existing tests to cover the multi-batch case, changing the select
> > count(*) to a select * would do the trick. I imagine we wouldn't want to
> > do this because of the excessive output this would produce. I wondered
> > if there was a pattern in the tests for getting around this.
>
> You could use explain (ANALYZE). But the output is machine-dependant in
> various ways (which is why the tests use "explain analyze so rarely).

I think with sufficient options it's not machine specific. We have a bunch of
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) ..
in our tests.

Cool. Yea, so ultimately these options are almost enough but memory
usage changes from execution to execution. There are some tests which do
regexp_replace() on the memory usage part of the EXPLAIN ANALYZE output
to allow us to still compare the plans. However, I figured if I was
already going to go to the trouble of using regexp_replace(), I might as
well write a function that returns the "Actual Rows" field from the
EXPLAIN ANALYZE output.

The attached patch does that. I admittedly mostly copy-pasted the
plpgsql function from similar examples in other tests, and I suspect it
may be overkill and also poorly written.

The nice thing about this approach is that we can modify some of the
existing tests in join_hash.sql to use this function and cover the code
to reset the matchbit for serial hashjoin, single batch parallel
hashjoin, and all batches of parallel multi-batch hashjoin without any
additional queries. (I'll leave testing match bit resetting with the
skew hashtable and match bit resetting in case of a rescan for another
day.)

I was able to delete the tests added in 558c9d75fe, as they became
redundant.

I wonder if any other tests are in need of an EXPLAIN (ANALYZE,
MEMORY_USAGE OFF) option? Perhaps it is quite unusual to only require a
deterministic field like 'Actual Rows'. If we had that option we could
also remove the extra EXPLAIN invocations before the actual query
executions.

- Melanie

Вложения

v1-0001-Test-multi-batch-PHJ-match-bit-initialization.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 20 апреля 2023 г., 03:42:08
Сообщение: Re: Remove io prefix from pg_stat_io columns

Следующее

От: Melanie Plageman
Дата: 20 апреля 2023 г., 03:47:07
Сообщение: Re: Wrong results from Parallel Hash Full Join

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Wrong results from Parallel Hash Full Join

Вложения

Предыдущее

Следующее