Re: Should CSV parsing be stricter about mid-field quotes?

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: Should CSV parsing be stricter about mid-field quotes?
Дата	13 мая 2023 г. 15:44:48
Msg-id	9f1e32aa-1267-7d8e-0472-66a04b83d2ea@dunslane.net обсуждение исходный текст
Ответ на	Re: Should CSV parsing be stricter about mid-field quotes? ("Joel Jacobson" <joel@compiler.org>)
Ответы	Re: Should CSV parsing be stricter about mid-field quotes? (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On 2023-05-13 Sa 04:20, Joel Jacobson wrote:

On Fri, May 12, 2023, at 21:57, Andrew Dunstan wrote:

Maybe this is unexpected by you, but it's not by me. What other sane interpretation of that data could there be? And what CSV producer outputs such horrible content? As you've noted, ours certainly does not. Our rules are clear: quotes within quotes must be escaped (default escape is by doubling the quote char). Allowing partial fields to be quoted was a deliberate decision when CSV parsing was implemented, because examples have been seen in the wild.
So I don't think our behaviour is broken or needs fixing. As mentioned by Greg, this is an example of the adage about being liberal in what you accept.

I understand your position, and your points are indeed in line with the

traditional "Robustness Principle" (aka "Postel's Law") [1] from 1980, which

suggests "be conservative in what you send, be liberal in what you accept."

However, I'd like to offer a different perspective that might be worth

considering.

A 2021 IETF draft, "The Harmful Consequences of the Robustness Principle" [2],

argues that the flexibility advocated by Postel's Law can lead to problems such

as unclear specifications and a multitude of varying implementations. Features

that initially seem helpful can unexpectedly turn into bugs, resulting in

unanticipated consequences and data integrity risks.

Based on the feedback from you and others, I'd like to revise my earlier

proposal. Rather than adding an option to preserve the existing behavior, I now

think it's better to simply report an error in such cases. This approach offers

several benefits: it simplifies the CSV parser, reduces the risk of

misinterpreting data due to malformed input, and prevents the all-too-familiar

situation where users blindly apply an error hint without understanding the

consequences.

Finally, I acknowledge that we can't foresee the number of CSV producers that

produce mid-field quoting, and this change may cause compatibility issues for

some users. However, I consider this an acceptable tradeoff. Users encountering

the error would receive a clear message explaining that mid-field quoting is not

allowed and that they should change their CSV producer's settings to escape

quotes by doubling the quote character. Importantly, this change guarantees that

previously parsed data won't be misinterpreted, as it only enforces stricter

parsing rules.

[1] https://datatracker.ietf.org/doc/html/rfc761#section-2.10

[2] https://www.ietf.org/archive/id/draft-iab-protocol-maintenance-05.html

I'm pretty reluctant to change something that's been working as designed for almost 20 years, and about which we have hitherto had zero complaints that I recall.

I could see an argument for a STRICT mode which would disallow partially quoted fields, although I'd like some evidence that we're dealing with a real problem here. Is there really a CSV producer that produces output like that you showed in your example? And if so has anyone objected to them about the insanity of that?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alexander Lakhin
Дата: 13 мая 2023 г., 13:00:00
Сообщение: Re: Order changes in PG16 since ICU introduction

Следующее

От: Tom Lane
Дата: 13 мая 2023 г., 16:45:41
Сообщение: Re: Should CSV parsing be stricter about mid-field quotes?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Should CSV parsing be stricter about mid-field quotes?

Предыдущее

Следующее