Re: Using quicksort for every external sort run

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: Using quicksort for every external sort run
Дата	29 марта 2016 г. 22:43:50
Msg-id	CAM3SWZTjR2wrO8zavt1LZf+5_HB--SWk_J9rPR9rXb+Jj7ATKw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Using quicksort for every external sort run (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>) Re: Using quicksort for every external sort run (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> One test that kind of bothers me in particular is the "SELECT DISTINCT
> a FROM numeric_test ORDER BY a" test on the high_cardinality_random
> data set.  That's a wash at most work_mem values, but at 32MB it's
> more than 3x slower.  That's very strange, and there are a number of
> other results like that, where one particular work_mem value triggers
> a large regression.  That's worrying.

That case is totally invalid as a benchmark for this patch. Here is
the query plan I get (doesn't matter if I run analyze) when I follow
Tomas' high_cardinality_random 10M instructions (including setting
work_mem to 32MB):

postgres=# explain analyze select distinct a from numeric_test order by a;
              QUERY 
PLAN

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────Sort
(cost=268895.39..270373.10 rows=591082 width=8) (actual 
time=3907.917..4086.174 rows=999879 loops=1)  Sort Key: a  Sort Method: external merge  Disk: 18536kB  ->
HashAggregate (cost=206320.50..212231.32 rows=591082 width=8) 
(actual time=3109.619..3387.599 rows=999879 loops=1)        Group Key: a        ->  Seq Scan on numeric_test
(cost=0.00..175844.40
rows=12190440 width=8) (actual time=0.025..601.295 rows=10000000
loops=1)Planning time: 0.088 msExecution time: 4120.656 ms
(8 rows)

Does that seem like a fair test of this patch?

I must also point out an inexplicable differences between the i5 and
Xeon in relation to this query. It took about took 10% less time on
the patched Xeon 10M case, not ~200% more (line 53 of the summary page
in each 10M case). So even if this case did exercise the patch well,
it's far from clear that it has even been regressed at all. It's far
easier to imagine that there was some problem with the i5 tests.

A complete do-over from Tomas would be best, here. He has already
acknowledged that the i5 CREATE INDEX results were completely invalid.
Pending a do-over from Tomas, I recommend ignoring the i5 tests
completely. Also, I should once again point out that many of the
work_mem cases actually had internal sorts at the high end, so once
the code in the patches simply wasn't exercised at all at the high end
(the 1024MB cases, where the numbers might be expected to get really
good).

If there is ever a regression, it is only really sensible to talk
about it while looking at trace_sort output (and, I guess, the query
plan). I've asked Tomas for trace_sort output in all relevant cases.
There is no point in "flying blind" and speculating what the problem
was from a distance.

> Also, it's pretty clear that the patch has more large wins than it
> does large losses, but it seems pretty easy to imagine people who
> haven't tuned any GUCs writing in to say that 9.6 is way slower on
> their workload, because those people are going to be at work_mem=4MB,
> maintenance_work_mem=64MB.  At those numbers, if Tomas's data is
> representative, it's not hard to imagine that the number of people who
> see a significant regression might be quite a bit larger than the
> number who see a significant speedup.

I don't think they are representative. Greg Stark characterized the
regressions as being fairly limited, mostly at the very low end. And
that was *before* all the memory fragmentation stuff made that better.
I haven't done any analysis of how much better that made the problem
*across the board* yet, but for int4 cases I could make 1MB work_mem
queries faster with gigabytes of data on my laptop. I believe I tested
various datum sort cases there, like "select count(distinct(foo)) from
bar"; those are a very pure test of the patch.

--
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 29 марта 2016 г., 22:42:03
Сообщение: Re: Parallel Queries and PostGIS

Следующее

От: Paul Ramsey
Дата: 29 марта 2016 г., 22:49:03
Сообщение: Re: Parallel Queries and PostGIS

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Using quicksort for every external sort run

Предыдущее

Следующее