Обсуждение: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

Поиск
Список
Период
Сортировка

COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
"Svante Richter"
Дата:
Hello!

The documentation for COPY says "To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker".

The input part only seems to work when using the COPY FROM CSV command, not \COPY FROM CSV. This is mentioned in a previous message here https://www.postgresql.org/message-id/a89f47e1-f716-4ae3-b882-cab5032a5d66%40manitou-mail.org but not documented.

This means that COPY TO CSV produces data that \COPY FROM CSV cannot read, which I'm assuming should be fixed (or at the very least documented as a serious limitation of \COPY FROM CSV). I found this out by not being able to load a backup of a table that I had exported via COPY TO CSV.

As the above message also mentioned this can be a security risk if using \COPY FROM STDIN CSV with untrusted data (https://www.postgresql.org/message-id/20190128214448.GH26761%40momjian.us says "I think the question is how many people are using CSV/STDIN for insecure data loads?") but I would absolutely expect data produced with COPY TO CSV to be safe to pipe to a \COPY FROM CSV, but this bug makes that unsafe unless I also explicitly set ON_ERROR_STOP=1.

SQL to reproduce:

CREATE TABLE testtable (a TEXT);
INSERT INTO testtable VALUES ('
\.
');
COPY testtable TO '/run/postgresql/test.csv' CSV;
COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one works
\COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one does not work


Error message:

ERROR:  unterminated CSV quoted field
CONTEXT:  COPY testtable, line 1: ""
"

Versions tested:

psql (PostgreSQL) 14.3 (under arch linux)
psql (PostgreSQL) 13.7 (Ubuntu 13.7-0ubuntu0.21.10.1)

Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
"Svante Richter"
Дата:
A gentle bump for this, hopefully that's alright!

On Wed, Jun 15, 2022, at 2:16 PM, Svante Richter wrote:
Hello!

The documentation for COPY says "To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker".

The input part only seems to work when using the COPY FROM CSV command, not \COPY FROM CSV. This is mentioned in a previous message here https://www.postgresql.org/message-id/a89f47e1-f716-4ae3-b882-cab5032a5d66%40manitou-mail.org but not documented.

This means that COPY TO CSV produces data that \COPY FROM CSV cannot read, which I'm assuming should be fixed (or at the very least documented as a serious limitation of \COPY FROM CSV). I found this out by not being able to load a backup of a table that I had exported via COPY TO CSV.

As the above message also mentioned this can be a security risk if using \COPY FROM STDIN CSV with untrusted data (https://www.postgresql.org/message-id/20190128214448.GH26761%40momjian.us says "I think the question is how many people are using CSV/STDIN for insecure data loads?") but I would absolutely expect data produced with COPY TO CSV to be safe to pipe to a \COPY FROM CSV, but this bug makes that unsafe unless I also explicitly set ON_ERROR_STOP=1.

SQL to reproduce:

CREATE TABLE testtable (a TEXT);
INSERT INTO testtable VALUES ('
\.
');
COPY testtable TO '/run/postgresql/test.csv' CSV;
COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one works
\COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one does not work


Error message:

ERROR:  unterminated CSV quoted field
CONTEXT:  COPY testtable, line 1: ""
"

Versions tested:

psql (PostgreSQL) 14.3 (under arch linux)
psql (PostgreSQL) 13.7 (Ubuntu 13.7-0ubuntu0.21.10.1)


Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
Noah Misch
Дата:
On Wed, Jun 15, 2022 at 02:16:14PM +0200, Svante Richter wrote:
> The documentation for COPY says "To avoid any misinterpretation, a `\.` data value appearing as a lone entry on a
lineis automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker".
 
> 
> The input part only seems to work when using the COPY FROM CSV command, not \COPY FROM CSV. This is mentioned in a
previousmessage here https://www.postgresql.org/message-id/a89f47e1-f716-4ae3-b882-cab5032a5d66%40manitou-mail.org but
notdocumented.
 

I agree this warrants a doc mention.  Also, the previous thread seems to have
ended with a focus on [\copy ... from stdin], but it's broader than that:

> COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one works
> \COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one does not work

The psql documentation says [\copy ... from stdin] looks for "\.".  It says
nothing about doing that for [\copy ... from filename].  I wonder if psql
should change the filename case to send the whole file to the server, ignoring
"\." inside.  (That is to say, change to match the psql documentation.)  This,
too, has an avoidable case of the problem:

  create table t (c text);
  \copy t from program 'printf ''"foo\n\\.\nbar"''' with csv



Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> The psql documentation says [\copy ... from stdin] looks for "\.".  It says
> nothing about doing that for [\copy ... from filename].  I wonder if psql
> should change the filename case to send the whole file to the server, ignoring
> "\." inside.  (That is to say, change to match the psql documentation.)

This seems like a sensible solution.  The need to use an in-band EOF
marker when reading from the command source file is clear, and there's
no non-kluge way there.  But that doesn't mean we should extend it
to separate files.  (I'm surprised to realize we do, actually.)

            regards, tom lane



Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
Bruce Momjian
Дата:
On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > The psql documentation says [\copy ... from stdin] looks for "\.".  It says
> > nothing about doing that for [\copy ... from filename].  I wonder if psql
> > should change the filename case to send the whole file to the server, ignoring
> > "\." inside.  (That is to say, change to match the psql documentation.)
> 
> This seems like a sensible solution.  The need to use an in-band EOF
> marker when reading from the command source file is clear, and there's
> no non-kluge way there.  But that doesn't mean we should extend it
> to separate files.  (I'm surprised to realize we do, actually.)

So, look what I found in psql/copy.c:

        /* check for EOF marker, but not on a partial line */
        if (at_line_begin)
        {
        /*
-->         * This code erroneously assumes '\.' on a line alone
-->         * inside a quoted CSV string terminates the \copy.
-->         * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
         */
        if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
            (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
        {
            copydone = true;
        }
        }

I wondered who would reference an email thread in the source code, so I
looked at the whole 2012 thread:

    https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org

and it was, of course, me.  :-(

Tom's analysis was similar ten years ago:

> Ugh.  This seems like a rather fundamental oversight in the CSV feature.
> The problem is that psql has no idea whether the copy is being done in
> CSV mode or not --- and even if it did, it doesn't parse the data fully
> enough to realize whether a \. line is inside quotes or not.
> 
> In the case of out-of-line data files, it might be reasonable to just
> dispense with the check for \. altogether and always ship the whole file
> to the backend; I think there's a \. check on the backend side.  (Not
> sure this is safe in V2 protocol, but I doubt anyone cares anymore
> about that.)
> 
> In the case of in-line data in a script file, CSV mode seems a bit
> broken in any case; there's no concept of a terminator in CSV, AFAIK.
> So maybe we don't have to worry about that.

I think at this point we should either fix this or document it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson




Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
Noah Misch
Дата:
On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote:
> On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > The psql documentation says [\copy ... from stdin] looks for "\.".  It says
> > > nothing about doing that for [\copy ... from filename].  I wonder if psql
> > > should change the filename case to send the whole file to the server, ignoring
> > > "\." inside.  (That is to say, change to match the psql documentation.)
> > 
> > This seems like a sensible solution.  The need to use an in-band EOF
> > marker when reading from the command source file is clear, and there's
> > no non-kluge way there.  But that doesn't mean we should extend it
> > to separate files.  (I'm surprised to realize we do, actually.)

> Tom's analysis was similar ten years ago:
> 
> > Ugh.  This seems like a rather fundamental oversight in the CSV feature.
> > The problem is that psql has no idea whether the copy is being done in
> > CSV mode or not --- and even if it did, it doesn't parse the data fully
> > enough to realize whether a \. line is inside quotes or not.
> > 
> > In the case of out-of-line data files, it might be reasonable to just
> > dispense with the check for \. altogether and always ship the whole file
> > to the backend; I think there's a \. check on the backend side.  (Not
> > sure this is safe in V2 protocol, but I doubt anyone cares anymore
> > about that.)

> I think at this point we should either fix this or document it.

Things have gotten simpler in the last ten years, specifically commit 3174d69
removing protocol v2.  Before that, the server would have mishandled "\." even
if the client didn't.  These days, one can fix all but [\copy ... from stdin].



Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
Bruce Momjian
Дата:
On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote:
> On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > The psql documentation says [\copy ... from stdin] looks for "\.".  It says
> > > nothing about doing that for [\copy ... from filename].  I wonder if psql
> > > should change the filename case to send the whole file to the server, ignoring
> > > "\." inside.  (That is to say, change to match the psql documentation.)
> > 
> > This seems like a sensible solution.  The need to use an in-band EOF
> > marker when reading from the command source file is clear, and there's
> > no non-kluge way there.  But that doesn't mean we should extend it
> > to separate files.  (I'm surprised to realize we do, actually.)
> 
> So, look what I found in psql/copy.c:
> 
>         /* check for EOF marker, but not on a partial line */
>         if (at_line_begin)
>         {
>         /*
> -->         * This code erroneously assumes '\.' on a line alone
> -->         * inside a quoted CSV string terminates the \copy.
> -->         * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
>          */
>         if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
>             (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
>         {
>             copydone = true;
>         }
>         }
> 
> I wondered who would reference an email thread in the source code, so I
> looked at the whole 2012 thread:
> 
>     https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org
> 
> and it was, of course, me.  :-(
> 
> Tom's analysis was similar ten years ago:
> 
> > Ugh.  This seems like a rather fundamental oversight in the CSV feature.
> > The problem is that psql has no idea whether the copy is being done in
> > CSV mode or not --- and even if it did, it doesn't parse the data fully
> > enough to realize whether a \. line is inside quotes or not.
> > 
> > In the case of out-of-line data files, it might be reasonable to just
> > dispense with the check for \. altogether and always ship the whole file
> > to the backend; I think there's a \. check on the backend side.  (Not
> > sure this is safe in V2 protocol, but I doubt anyone cares anymore
> > about that.)
> > 
> > In the case of in-line data in a script file, CSV mode seems a bit
> > broken in any case; there's no concept of a terminator in CSV, AFAIK.
> > So maybe we don't have to worry about that.
> 
> I think at this point we should either fix this or document it.

I have decided to document this with the attached patch.  I plan to
apply it to all supported Postgres versions.  If we ever fix FROM file
so only FROM STDIN is a problem, we can update the documentation.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

От
Bruce Momjian
Дата:
On Sat, Oct 28, 2023 at 11:38:37AM -0400, Bruce Momjian wrote:
> On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote:
> > I think at this point we should either fix this or document it.
> 
> I have decided to document this with the attached patch.  I plan to
> apply it to all supported Postgres versions.  If we ever fix FROM file
> so only FROM STDIN is a problem, we can update the documentation.

Patch applied to all supported versions.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.