On 19 April 2012 19:24, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> I kind of understood timsort would shine in sorting text in non-C
> collation, because of the comparison cost. So a test in some UTF8
> collation or other would be interesting, right?
That's certainly the theory, yes. In practice, even though timsort
lives up to its promise of significantly reducing the number of
comparisons required in many common situations, my implementation does
not actually perform better than qsort_arg. Even a reduction of over
10% in the number of comparisons does not result in a net performance
gain. It wouldn't surprise me if the implementation used is quite
suboptimal, and it might well be worth profiling and optimising. It
doesn't appear to be the big win that I'd hoped for though. It's
necessary to stretch the assumed cost of a comparison rather a lot
further than the very common case of sorting a single key of non-c
collated text for it to be worth it, and that's just too thin for me
to sink more time into this right now.
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services