Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Using quicksort for every external sort run
Дата
Msg-id CAM3SWZT+KegfDfocD2jCGF_ngMxXFepc=ps3HCBjz5HC=Rmz+w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Thu, Aug 20, 2015 at 5:02 PM, Greg Stark <stark@mit.edu> wrote:
> I haven't thought through the exponential
> growth carefully enough to tell if doubling the run size should
> decrease the number of passes linearly or by a constant number.

It seems that with 5 times the data that previously required ~30MB to
avoid a multi-pass sort (where ~2300MB is required for an internal
sort -- the benchmark query), it took ~60MB to avoid a multi-pass
sort. I guess I just didn't exactly determine either threshold due to
that taking too long, and that as predicted, every time the input size
quadruples, the required amount of work_mem to avoid multiple passes
only doubles. That will need to be verified more vigorously, but it
looks that way.

> But you're right that seems to be less and less a realistic scenario.
> Times when users are really processing data sets that large nowadays
> they'll just throw it into Hadoop or Biigquery or whatever to get the
> parallelism of many cpus. Or maybe Citus and the like.

I'm not sure that even that's generally true, simply because sorting a
huge amount of data is very expensive -- it's not really a "big data"
thing, so to speak. Look at recent results on this site:

http://sortbenchmark.org

Last year's winning "Gray" entrant, TritonSort, uses a huge parallel
cluster of 186 machines, but only sorts 100TB. That's just over 500GB
per node. Each node is a 32 core Intel Xeon EC2 instance with 244GB
memory, and lots of SSDs. It seems like the point of the 100TB minimum
rule in the "Gray" contest category is that that's practically
impossible to fit entirely in memory (to avoid merging).

Eventually, linearithmic growth becomes extremely painful, not matter
how much processing power you have. It takes a while, though.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rajeev rastogi
Дата:
Сообщение: Re: Autonomous Transaction is back
Следующее
От: Tom Lane
Дата:
Сообщение: Re: 9.5 release notes