WIP: further sorting speedup

Поиск
Список
Период
Сортировка
От Tom Lane
Тема WIP: further sorting speedup
Дата
Msg-id 15464.1140403246@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: WIP: further sorting speedup  ("Luke Lonergan" <llonergan@greenplum.com>)
Re: WIP: further sorting speedup  (Simon Riggs <simon@2ndquadrant.com>)
Re: WIP: further sorting speedup  ("Jim C. Nasby" <jnasby@pervasive.com>)
Список pgsql-patches
After applying Simon's recent sort patch, I was doing some profiling and
noticed that sorting spends an unreasonably large fraction of its time
extracting datums from tuples (heap_getattr or index_getattr).  The
attached patch does something about this by pulling out the leading sort
column of a tuple when it is received by the sort code or re-read from a
"tape".  This increases the space needed by 8 or 12 bytes (depending on
sizeof(Datum)) per in-memory tuple, but doesn't cost anything as far as
the on-disk representation goes.  The effort needed to extract the datum
at this point is well repaid because the tuple will normally undergo
multiple comparisons while it remains in memory.  In some quick tests
the patch seemed to make for a significant speedup, on the order of 30%,
despite increasing the number of runs emitted because of the smaller
available memory.

The choice to pull out just the leading column, rather than all columns,
is driven by concerns of (a) code complexity and (b) memory space.
Having the extra columns pre-extracted wouldn't buy anything anyway
in the common case where the leading key determines the result of
a comparison.

This is still WIP because it leaks memory intra-query (I need to fix it
to clean up palloc'd space better).  I thought I'd post it now in case
anyone wants to try some measurements for their own favorite test cases.
In particular it would be interesting to see what happens for a
multi-column sort with lots of duplicated keys in the first column,
which is the case where the least advantage would be gained.

Comments?

            regards, tom lane


Вложения

В списке pgsql-patches по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: ScanDirections
Следующее
От: James William Pye
Дата:
Сообщение: Re: ScanDirections