Re: [PATCH] json_lex_string: don't overread on bad UTF8

Поиск

Список

Период

Сортировка

От	Jacob Champion
Тема	Re: [PATCH] json_lex_string: don't overread on bad UTF8
Дата	3 мая 17:05:38
Msg-id	CAOYmi+=BomJrQUBgy5FQY9ZtHvuK7WOJNB6foPUv21qfb2+YPw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] json_lex_string: don't overread on bad UTF8 (Peter Eisentraut <peter@eisentraut.org>)
Ответы	Re: [PATCH] json_lex_string: don't overread on bad UTF8
Список	pgsql-hackers

Дерево обсуждения

On Fri, May 3, 2024 at 4:54 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 30.04.24 19:39, Jacob Champion wrote:
> > Tangentially: Should we maybe rethink pieces of the json_lex_string
> > error handling? For example, do we really want to echo an incomplete
> > multibyte sequence once we know it's bad?
>
> I can't quite find the place you might be looking at in
> json_lex_string(),

(json_lex_string() reports the beginning and end of the "area of
interest" via the JsonLexContext; it's json_errdetail() that turns
that into an error message.)

> but for the general encoding conversion we have what
> would appear to be the same behavior in report_invalid_encoding(), and
> we go out of our way there to produce a verbose error message including
> the invalid data.

We could port something like that to src/common. IMO that'd be more
suited for an actual conversion routine, though, as opposed to a
parser that for the most part assumes you didn't lie about the input
encoding and is just trying not to crash if you're wrong. Most of the
time, the parser just copies bytes between delimiters around and it's
up to the caller to handle encodings... the exceptions to that are the
\uXXXX escapes and the error handling.

Offhand, are all of our supported frontend encodings
self-synchronizing? By that I mean, is it safe to print a partial byte
sequence if the locale isn't UTF-8? (As I type this I'm starting at
Shift-JIS, and thinking "probably not.")

Actually -- hopefully this is not too much of a tangent -- that
further crystallizes a vague unease about the API that I have. The
JsonLexContext is initialized with something called the
"input_encoding", but that encoding is necessarily also the output
encoding for parsed string literals and error messages. For the server
side that's fine, but frontend clients have the input_encoding locked
to UTF-8, which seems like it might cause problems? Maybe I'm missing
code somewhere, but I don't see a conversion routine from
json_errdetail() to the actual client/locale encoding. (And the parser
does not support multibyte input_encodings that contain ASCII in trail
bytes.)

Thanks,
--Jacob

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Justin Pryzby
Дата: 03 мая, 17:05:19
Сообщение: Re: pg17 issues with not-null contraints

Следующее

От: Peter Eisentraut
Дата: 03 мая, 17:09:58
Сообщение: Re: Document NULL

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] json_lex_string: don't overread on bad UTF8

Предыдущее

Следующее