On Tue, Mar 22, 2016 at 07:19:44PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>
> Indeed. To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike. Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box. While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...
I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin
2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport()
already blacklists the UTF8/native Windows case.) The test passed on Solaris
10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
See attached tryalllocales.sh outputs. I did not test AIX, because the AIX
machines I use have no UTF8 locales installed.