Re: copying json data and backslashes

Поиск
Список
Период
Сортировка
От Erik Wienhold
Тема Re: copying json data and backslashes
Дата
Msg-id 1342743333.378397.1669130157980@office.mailbox.org
обсуждение исходный текст
Ответ на copying json data and backslashes  (Alastair McKinley <a.mckinley@analyticsengines.com>)
Ответы Re: copying json data and backslashes  ("pbj@cmicdo.com" <pbj@cmicdo.com>)
Список pgsql-general
> On 22/11/2022 15:23 CET Alastair McKinley <a.mckinley@analyticsengines.com> wrote:
>
> Hi all,
>
> I have come across this apparently common issue COPY-ing json and wondering if
> there is potentially a better solution.
>
> I am copying data into a jsonb column originating from a 3rd party API. The
> data may have literal \r,\t,\n and also double backslashes.
>
> I discovered that I can cast this data to a jsonb value directly but I can't
> COPY the data without pre-processing.
>
> The example below illustrates my issue (only with \r, but the problem extends
> to other \X combinations).
>
> > do $$
> > lines=[r'{"test" : "\r this data has a carriage return"}']
> >
> > with open("/tmp/test1.json","w") as f:
> > for line in lines:
> > f.write(line.strip() + "\n")
> >
> > $$ language plpython3u;
> >
> > create temp table testing (data jsonb);
> >
> > -- this works
> > insert into testing (data)
> > select l::jsonb
> > from pg_read_file('/tmp/test1.json') f,
> > lateral regexp_split_to_table(f,'\n') l where l <> '';
> >
> > -- fails
> > copy testing (data) from '/tmp/test1.json';
> >
> > -- works
> > copy testing (data) from program $c$ sed -e 's/\\r/\\\\u000a/g' /tmp/test1.json $c$;
> >
>
> Is there any other solution with COPY that doesn't require manual
> implementation of search/replace to handle these edge cases?
> Why does ::jsonb work but COPY doesn't? It seems a bit inconsistent.

COPY handles special backslash sequences[1].  The \r in your sample JSON,
although properly escaped according to JSON, is replaced with an actual
carriage return by COPY before casting to jsonb.  The error results from JSON
prohibiting unescaped control characters in strings[2].

You must double escape to pass those characters through COPY.

See how COPY outputs backslash sequences:

    -- Actual carriage return:
    copy (select e'\r') to stdout;
    \r

    -- Backslash sequence for carriage return:
    copy (select '\r') to stdout;
    \\r

[1] https://www.postgresql.org/docs/current/sql-copy.html#id-1.9.3.55.9.2
[2] https://www.json.org/json-en.html

--
Erik



В списке pgsql-general по дате отправления:

Предыдущее
От: Alastair McKinley
Дата:
Сообщение: copying json data and backslashes
Следующее
От: "pbj@cmicdo.com"
Дата:
Сообщение: Re: copying json data and backslashes