Re: Abbreviated keys for Numeric

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Abbreviated keys for Numeric
Дата
Msg-id 54E8E571.2030507@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Abbreviated keys for Numeric  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On 21.2.2015 19:57, Peter Geoghegan wrote:
> On Fri, Feb 20, 2015 at 9:18 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> The gains for text are also very nice, although in this case that only
>> happens for the smallest scale (1M rows), and for larger scales it's
>> actually slower than current master :-(
> 
> That's odd. I have a hard time thinking of why the datum sort patch
> could be at fault, though. I bet the cost model of the text
> sortsupport routine is somehow hitting a snag on those larger sized
> sets. They should be just as accelerated, and probably more so, than
> your 1M sized set that was sped up 4x here.
> 
> Can you see what is output with debugging of text abbreviation turned
> on? Put "#define DEBUG_ABBREV_KEYS" at the top of varlena.c and
> rebuild. Report on the debug1 output, and see if and when abbreviation
> is aborted.
> 
> I suspected that the cost model was too conservative (or, more
> lightly, just too simplistic). I ought to revisit my patch to give the
> ad-hoc cost model a sense of proportion about how far along we are,
> which was previously deferred [1]. When there is a strong
> physical/logical correlation, that can be essential.

I don't think there's a correlation - at least not in the usual sense
that the data are stored 'sorted' by the column. The tables were
generated as random.

Hmmm, maybe we should add such correlated cases into the test script, to
see how that behaves with those patches ...

> 
> Did you first index the text field, and then run CLUSTER for the
> larger sized sets on that index (to test abbreviation)? That would
> cause there to be a lot of abbreviated keys that seemed to poorly
> capture the entropy of their underlying values, when in fact that was
> entirely down to our only considering the first 10 tuples in a 100
> million tuple set. Having some patience is important there, and a hint
> at how far in we are gives the ad-hoc cost model a much better sense
> of proportion...it then has a sense of how patient it should be.

There are no indexes in the test script.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Bootstrap DATA is a pita
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Abbreviated keys for Numeric