Re: wip: functions median and percentile

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: wip: functions median and percentile
Дата
Msg-id AANLkTikF_zQdz0x49muKgabAYk7dAQvnY=OY1TwfEV3o@mail.gmail.com
обсуждение исходный текст
Ответ на Re: wip: functions median and percentile  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On Sun, Oct 3, 2010 at 11:58 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> The problem is that performance really sucks,
> because it is an O(n^2 log(n)) algorithm. I don't see an easy way
> around that without significant new infrastructure, as Greg describes,
> or a completely different algorithm.

Resorting for each record would be O(n^2 log(n)). If you use
Quickselect it would be O(n^2) because each selection would be O(n).
But then we could get O(n^2) by just doing insertion-sort. The problem
with both these algorithms is that I don't see how to do it on-disk.
Maybe there would be some way to do insertion-sort on disk by keeping
a buffer in-memory holding the last n records inserted and whenever
the in-memory buffer fills do a merge against the data on disk to
spill it. But that's a lot of special-case machinery. Personally I
think if we need it to handle sorted aggregates over window functions
it's  worth it to carry it if someone feels like writing it.



-- 
greg


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: OUTER keyword
Следующее
От: Greg Stark
Дата:
Сообщение: Re: Adding getrusage profiling data to EXPLAIN output