Re: [HACKERS] The case for removing replacement selection sort

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] The case for removing replacement selection sort
Дата
Msg-id CAH2-WznFOrDV2Ht3e1bAM64anRUYK6G8ao7ABbLCHJBWLY+asQ@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] The case for removing replacement selection sort  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Fri, Sep 15, 2017 at 6:34 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> e5-2620-v4
> ----------
> - probably the CPU we should be looking at, as it's the current model
> - in some cases this gives us 3-5x speedup with low work_mem settings
> - consider for example the very last line, which shows that
>
>   SELECT DISTINCT a FROM text_test_padding ORDER BY a
>
> completed in ~5531ms on work_mem=8MB and 1067ms on 32MB, but with the
> patch it completes in 1784ms (1MB), 1211ms (4MB) and 1104 (8MB).

I think that this is because of the importance of getting a final
on-the-fly merge. Overlapping I/O with computation matters a lot here.

> - Of course, this is for already-sorted data, and for other data sets
> the improvement is more modest. It's difficult to summarize this into a
> single number, but there are plenty of improvements in 20-30% range.
>
> - Some cases are a bit slower, of course, but overall I'd say the chart
> is much more green than red. Also the slow-downs are much smaller,
> compared to speed-ups (generally within 1-5%).

The larger regressions mostly involve numeric. We have two
palloc()/pfree() cycles in the standard numeric comparator (one for
each datum that is compared), a cost which could be removed by
enhancing numeric SortSupport directly. That's why the numeric
abbreviated key stuff was the most effective (more effective than
text) when added to 9.5. We currently have very expensive full
comparisons of datums within L1 cache ("merge comparisons") relative
to abbreviated key comparisons -- the difference is huge, but could be
reduced.

Anyway, it seems ironic that in those few cases where replacement
selection still does better, it seems to be due to reduced CPU costs,
not reduced I/O costs. This is the opposite benefit to the one you'd
expect from reading Knuth.

Thanks for benchmarking! I hope that this removes the doubt that
replacement selection previously benefited from; it now clearly
deserves to be removed from tuplesort.c.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Arseny Sher
Дата:
Сообщение: Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] POC: Sharing record typmods between backends