Обсуждение: pg_read_file() and non-ascii input file

Поиск

Список

Период

Сортировка

pg_read_file() and non-ascii input file

От

Itagaki Takahiro

Дата:

28 октября 2009 г., 02:27:19

pg_read_file() takes byte-offset and length as arguments,
but we don't check the result text with pg_verify_mbstr().
Should pg_read_file() return bytea instead of text or adding
some codes to verify the input? Only superusers are allowed
to use the function, but it is still dangerous.

If we leave the result in text type and add verifier, we also need to
consider how to handle multi-byte text. Offset and length should not
split one multi-byte character. We can assume the offset as a correct
boundary if we can trust users, but no one knows correct length before
the function call.

An idea is to have binary and text versions of pg_read_file: * pg_read_binary_file(filename, offset, length) : bytea *
pg_read_text_file(filename,offset) : ROW( text, nextline_offset )     -- it returns the next line starting with
'offset'.
but such changes could bring on compatibility problems.

Comments, better ideas?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Re: pg_read_file() and non-ascii input file

От

Itagaki Takahiro

Дата:

30 ноября 2009 г., 08:37:32

Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:

> pg_read_file() takes byte-offset and length as arguments,
> but we don't check the result text with pg_verify_mbstr().
> Should pg_read_file() return bytea instead of text or adding
> some codes to verify the input? Only superusers are allowed
> to use the function, but it is still dangerous.

Here is a patch to modify the result type of pg_read_file to bytea.
I think it is a possibly-security hole -- it might be safe because only
supersusers can use the function, but it is still dangerous.
We can still use the function to read a text file:
    SELECT convert_from(pg_read_file(...), 'encoding')

If we want to keep backward compatibility, the issue can be fixed
by adding pg_verifymbstr() to the function. We can also have the
binary version in another name, like pg_read_binary_file().

Which solution is better? Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Вложения

pg_read_file_as_bytea.patch

Re: pg_read_file() and non-ascii input file

От

Robert Haas

Дата:

30 декабря 2009 г., 21:15:18

On Mon, Nov 30, 2009 at 4:36 AM, Itagaki Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:
> If we want to keep backward compatibility, the issue can be fixed
> by adding pg_verifymbstr() to the function. We can also have the
> binary version in another name, like pg_read_binary_file().

I don't feel good about changing the return type of an existing
function, so I guess +1 from me on the approach quoted above.

...Robert

Re: pg_read_file() and non-ascii input file

От

Takahiro Itagaki

Дата:

04 января 2010 г., 01:09:21

Robert Haas <robertmhaas@gmail.com> wrote:

> > If we want to keep backward compatibility, the issue can be fixed
> > by adding pg_verifymbstr() to the function.
>
> I don't feel good about changing the return type of an existing
> function, so I guess +1 from me on the approach quoted above.

Ok, I just added pg_verifymbstr() instead of changing the result type.

I didn't add any additinal file reading functions in the patch, but
I'm willing to add them if someone want them:
  - pg_read_file_with_encoding()
  - pg_read_binary_file() RETURNS bytea
  - pg_read_text_file() RETURNS SETOF text -- returns set of lines

One thing bothering me is the HINT message on error is just pointless;
The encoding is controlled by "server_encoding" here. We will have the
same error message in server-side COPY commands. We'd better improving
the message, though it should be done by another patch.

=# SELECT pg_read_file('invalid.txt', 0, (pg_stat_file('invalid.txt')).size);
ERROR:  invalid byte sequence for encoding "UTF8": 0x93
HINT:  This error can also happen if the byte sequence does not match the
       encoding expected by the server, which is controlled by "client_encoding".

=# COPY tbl FROM 'invalid.txt'; -- server-side copy from a local file.
(the same message -- but the encoding should match with "server_encoding")

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Вложения

pg_read_file_20100104.patch

Re: pg_read_file() and non-ascii input file

От

Robert Haas

Дата:

04 января 2010 г., 02:36:16

On Sun, Jan 3, 2010 at 9:10 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> > If we want to keep backward compatibility, the issue can be fixed
>> > by adding pg_verifymbstr() to the function.
>>
>> I don't feel good about changing the return type of an existing
>> function, so I guess +1 from me on the approach quoted above.
>
> Ok, I just added pg_verifymbstr() instead of changing the result type.

Sounds fine.

> I didn't add any additinal file reading functions in the patch, but
> I'm willing to add them if someone want them:
>  - pg_read_file_with_encoding()
>  - pg_read_binary_file() RETURNS bytea
>  - pg_read_text_file() RETURNS SETOF text -- returns set of lines

OK.

> One thing bothering me is the HINT message on error is just pointless;
> The encoding is controlled by "server_encoding" here. We will have the
> same error message in server-side COPY commands. We'd better improving
> the message, though it should be done by another patch.

Interestingly, this same issue is being discussed on the thread
entitled "invalid UTF-8 via pl/perl".  I suppose whatever solution is
adopted there can be used here as well.  FWIW, repurposing the third
argument of pg_verifymbstr seems like the most sensible option to me.

...Robert

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: pg_read_file() and non-ascii input file

pg_read_file() and non-ascii input file

Re: pg_read_file() and non-ascii input file

Вложения

Re: pg_read_file() and non-ascii input file

Re: pg_read_file() and non-ascii input file

Вложения

Re: pg_read_file() and non-ascii input file