Re: Should CSV parsing be stricter about mid-field quotes?

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Should CSV parsing be stricter about mid-field quotes?
Дата
Msg-id CAM-w4HPO4JwvHLAHE7TdnLAh9TvM3P9LhiPXHUyygFx1uhQMOQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Should CSV parsing be stricter about mid-field quotes?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Should CSV parsing be stricter about mid-field quotes?  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
On Sat, 13 May 2023 at 09:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andrew Dunstan <andrew@dunslane.net> writes:
> > I could see an argument for a STRICT mode which would disallow partially
> > quoted fields, although I'd like some evidence that we're dealing with a
> > real problem here. Is there really a CSV producer that produces output
> > like that you showed in your example? And if so has anyone objected to
> > them about the insanity of that?
>
> I think you'd want not just "some evidence" but "compelling evidence".
> Any such option is going to add cycles into the low-level input parser
> for COPY, which we know is a hot spot and we've expended plenty of
> sweat on.  Adding a speed penalty that will be paid by the 99.99%
> of users who don't have an issue here is going to be a hard sell.

Well I'm not sure that follows. Joel specifically claimed that an
implementation that didn't accept inputs like this would actually be
simpler and that might mean it would actually be faster.

And I don't think you have to look very hard for inputs like this --
plenty of people generate CSV files from simple templates or script
outputs that don't understand escaping quotation marks at all. Outputs
like that will be fine as long as there's no doublequotes in the
inputs but then one day someone will enter a doublequote in a form
somewhere and blammo.

So I guess the real question is whether accepting inputs with
unescapted quotes and interpreting them the way we do is really the
best interpretation. Is the user best served by a) assuming they
intended to quote part of the field and not quote part of it b) assume
they failed to escape the quotation mark or c) assume something's gone
wrong and the input is entirely untrustworthy.

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Should CSV parsing be stricter about mid-field quotes?