Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Using quicksort for every external sort run
Дата
Msg-id 56FB25BB.6070805@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
Hi,

On 03/29/2016 09:43 PM, Peter Geoghegan wrote:
> On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> One test that kind of bothers me in particular is the "SELECT DISTINCT
>> a FROM numeric_test ORDER BY a" test on the high_cardinality_random
>> data set.  That's a wash at most work_mem values, but at 32MB it's
>> more than 3x slower.  That's very strange, and there are a number of
>> other results like that, where one particular work_mem value triggers
>> a large regression.  That's worrying.
>
> That case is totally invalid as a benchmark for this patch. Here is
> the query plan I get (doesn't matter if I run analyze) when I follow
> Tomas' high_cardinality_random 10M instructions (including setting
> work_mem to 32MB):
>
> postgres=# explain analyze select distinct a from numeric_test order by a;
>                                                                QUERY
> PLAN
>
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>   Sort  (cost=268895.39..270373.10 rows=591082 width=8) (actual
> time=3907.917..4086.174 rows=999879 loops=1)
>     Sort Key: a
>     Sort Method: external merge  Disk: 18536kB
>     ->  HashAggregate  (cost=206320.50..212231.32 rows=591082 width=8)
> (actual time=3109.619..3387.599 rows=999879 loops=1)
>           Group Key: a
>           ->  Seq Scan on numeric_test  (cost=0.00..175844.40
> rows=12190440 width=8) (actual time=0.025..601.295 rows=10000000
> loops=1)
>   Planning time: 0.088 ms
>   Execution time: 4120.656 ms
> (8 rows)
>
> Does that seem like a fair test of this patch?

And why not? I mean, why should it be acceptable to slow down?

>
> I must also point out an inexplicable differences between the i5 and
> Xeon in relation to this query. It took about took 10% less time on
> the patched Xeon 10M case, not ~200% more (line 53 of the summary page
> in each 10M case). So even if this case did exercise the patch well,
> it's far from clear that it has even been regressed at all. It's far
> easier to imagine that there was some problem with the i5 tests.

That may be easily due to differences between the CPUs and 
configuration. For example the Xeon uses a way older CPU with different 
amounts of CPU cache, and it's also a multi-socket system. And so on.

> A complete do-over from Tomas would be best, here. He has already
> acknowledged that the i5 CREATE INDEX results were completely invalid.
> Pending a do-over from Tomas, I recommend ignoring the i5 tests
> completely. Also, I should once again point out that many of the
> work_mem cases actually had internal sorts at the high end, so once
> the code in the patches simply wasn't exercised at all at the high end
> (the 1024MB cases, where the numbers might be expected to get really
> good).
>
> If there is ever a regression, it is only really sensible to talk
> about it while looking at trace_sort output (and, I guess, the query
> plan). I've asked Tomas for trace_sort output in all relevant cases.
> There is no point in "flying blind" and speculating what the problem
> was from a distance.

The updated benchmarks are currently running. I'm out of office until 
Friday, and I'd like to process the results over the weekend. FWIW I'll 
have results for these cases:

1) unpatched (a414d96a)
2) patched, default settings
3) patched, replacement_sort_mem=64

Also, I'll have trace_sort=on output for all the queries, so we can 
investigate further.

>
>> Also, it's pretty clear that the patch has more large wins than it
>> does large losses, but it seems pretty easy to imagine people who
>> haven't tuned any GUCs writing in to say that 9.6 is way slower on
>> their workload, because those people are going to be at
>> work_mem=4MB, maintenance_work_mem=64MB. At those numbers, if
>> Tomas's data is representative, it's not hard to imagine that the
>> number of people who see a significant regression might be quite a
>> bit larger than the number who see a significant speedup.

Yeah. That was one of the goals of the benchmark, to come up with some 
tuning recommendations. On some systems significantly increasing memory 
GUCs may not be possible, though - say, on very small systems with very 
limited amounts of RAM.

>
> I don't think they are representative. Greg Stark characterized the
> regressions as being fairly limited, mostly at the very low end. And
> that was *before* all the memory fragmentation stuff made that
> better. I haven't done any analysis of how much better that made the
> problem *across the board* yet, but for int4 cases I could make 1MB
> work_mem queries faster with gigabytes of data on my laptop. I
> believe I tested various datum sort cases there, like "select
> count(distinct(foo)) from bar"; those are a very pure test of the
> patch.
>

Well, I'd guess those conclusions may be a bit subjective.

regards


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Using quicksort for every external sort run
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] Re: BUG #13854: SSPI authentication failure: wrong realm name used