Re: Making TEXT NUL-transparent

Поиск

Список

Период

Сортировка

От	Florian Pflug
Тема	Re: Making TEXT NUL-transparent
Дата	24 ноября 2011 г. 12:48:27
Msg-id	21D9E9C6-552A-4CE1-BF9A-178D4C2DC272@phlo.org обсуждение исходный текст
Ответ на	Re: Making TEXT NUL-transparent (Florian Weimer <fweimer@bfk.de>)
Список	pgsql-hackers

Дерево обсуждения

On Nov24, 2011, at 10:54 , Florian Weimer wrote:
>> Or is it not only about being able to *store* NULs in a text field?
> 
> No, the entire core should be NUL-transparent.

That's unlikely to happen. A more realistic approach would be to solve
this only for UTF-8 encoded strings by encoding the NUL character not as
a single 0 byte, but as sequence of non-0 bytes.

Such a thing is possible in UTF-8 because there are multiple ways to
encode the same character once you drop the requirement that characters
be encoded in the *shortest* possible way.

Since we very probably won't loosen up UTF-8's integrity checks to allow
that, it'd have to be done as a new encoding, say 'utf8-loose'.

That new encoding could, for example, use 0xC0 0x80 to represent NUL
characters. This byte sequence is invalid in standard-conforming UTF-8
because it's a non-normalized (i.e. overly long) representation a code
point (the code point NUL, incidentally). A bit of googling suggests that
quite a few piece of software use this kind of modified UTF-8 encoding.

Java, for example, seems to use it to serialize Strings (which may contain
NUL characters) to UTF-8.

Should you try to add a new encoding which supports that, you might also
want to allow CESU-8-style encoding of UTF-16 surrogate pairs. This means
that code points representable by UTF-16 surrogate pairs may be encoded by
separately encoding the two surrogate characters in UTF-8.

best regards,
Florian Pflug

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alexander Shulgin
Дата: 24 ноября 2011 г., 12:43:35
Сообщение: Re: Notes on implementing URI syntax for libpq

Следующее

От: Robert Haas
Дата: 24 ноября 2011 г., 12:48:55
Сообщение: Re: Time bug with small years

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Making TEXT NUL-transparent

Предыдущее

Следующее