On 7/1/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
> > On 6/26/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> >> BTW, I don't think you can use that same-length optimization for
> >> citext. There's no reason to think that upper/lowercase pairs will
> >> have the same length all the time in multibyte encodings.
>
> > What about this code in current str_tolower():
>
> > /* Output workspace cannot have more codes than input bytes */
> > workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
>
>
> That's working with wchars, not bytes.
Ah, I missed the point of char2wchar() line.
I'm rather unfamiliar with various MB API-s, sorry.
There's another thing I'm probably missing: does current code handle
multi-wchar codepoints? Or is it guaranteed they don't happen?
(Wasn't wchar_t usually 16bit value?)
--
marko