Обсуждение: pg_read_file() and non-ascii input file
pg_read_file() takes byte-offset and length as arguments, but we don't check the result text with pg_verify_mbstr(). Should pg_read_file() return bytea instead of text or adding some codes to verify the input? Only superusers are allowed to use the function, but it is still dangerous. If we leave the result in text type and add verifier, we also need to consider how to handle multi-byte text. Offset and length should not split one multi-byte character. We can assume the offset as a correct boundary if we can trust users, but no one knows correct length before the function call. An idea is to have binary and text versions of pg_read_file: * pg_read_binary_file(filename, offset, length) : bytea * pg_read_text_file(filename,offset) : ROW( text, nextline_offset ) -- it returns the next line starting with 'offset'. but such changes could bring on compatibility problems. Comments, better ideas? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > pg_read_file() takes byte-offset and length as arguments, > but we don't check the result text with pg_verify_mbstr(). > Should pg_read_file() return bytea instead of text or adding > some codes to verify the input? Only superusers are allowed > to use the function, but it is still dangerous. Here is a patch to modify the result type of pg_read_file to bytea. I think it is a possibly-security hole -- it might be safe because only supersusers can use the function, but it is still dangerous. We can still use the function to read a text file: SELECT convert_from(pg_read_file(...), 'encoding') If we want to keep backward compatibility, the issue can be fixed by adding pg_verifymbstr() to the function. We can also have the binary version in another name, like pg_read_binary_file(). Which solution is better? Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Вложения
On Mon, Nov 30, 2009 at 4:36 AM, Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > If we want to keep backward compatibility, the issue can be fixed > by adding pg_verifymbstr() to the function. We can also have the > binary version in another name, like pg_read_binary_file(). I don't feel good about changing the return type of an existing function, so I guess +1 from me on the approach quoted above. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > > If we want to keep backward compatibility, the issue can be fixed > > by adding pg_verifymbstr() to the function. > > I don't feel good about changing the return type of an existing > function, so I guess +1 from me on the approach quoted above. Ok, I just added pg_verifymbstr() instead of changing the result type. I didn't add any additinal file reading functions in the patch, but I'm willing to add them if someone want them: - pg_read_file_with_encoding() - pg_read_binary_file() RETURNS bytea - pg_read_text_file() RETURNS SETOF text -- returns set of lines One thing bothering me is the HINT message on error is just pointless; The encoding is controlled by "server_encoding" here. We will have the same error message in server-side COPY commands. We'd better improving the message, though it should be done by another patch. =# SELECT pg_read_file('invalid.txt', 0, (pg_stat_file('invalid.txt')).size); ERROR: invalid byte sequence for encoding "UTF8": 0x93 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". =# COPY tbl FROM 'invalid.txt'; -- server-side copy from a local file. (the same message -- but the encoding should match with "server_encoding") Regards, --- Takahiro Itagaki NTT Open Source Software Center
Вложения
On Sun, Jan 3, 2010 at 9:10 PM, Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > > Robert Haas <robertmhaas@gmail.com> wrote: > >> > If we want to keep backward compatibility, the issue can be fixed >> > by adding pg_verifymbstr() to the function. >> >> I don't feel good about changing the return type of an existing >> function, so I guess +1 from me on the approach quoted above. > > Ok, I just added pg_verifymbstr() instead of changing the result type. Sounds fine. > I didn't add any additinal file reading functions in the patch, but > I'm willing to add them if someone want them: > - pg_read_file_with_encoding() > - pg_read_binary_file() RETURNS bytea > - pg_read_text_file() RETURNS SETOF text -- returns set of lines OK. > One thing bothering me is the HINT message on error is just pointless; > The encoding is controlled by "server_encoding" here. We will have the > same error message in server-side COPY commands. We'd better improving > the message, though it should be done by another patch. Interestingly, this same issue is being discussed on the thread entitled "invalid UTF-8 via pl/perl". I suppose whatever solution is adopted there can be used here as well. FWIW, repurposing the third argument of pg_verifymbstr seems like the most sensible option to me. ...Robert