Обсуждение: Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding
Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding
От
Alexander Law
Дата:
<div class="moz-cite-prefix">Hello,<br /> I would like to fix this bug, but it looks like it would be not one-line patch.<br/> Looking at the pg_dump code I see that the object names come through the following chain:<br /> 1. pg_dump executes'SELECT c.tableoid, c.oid, c.relname, ... ' and gets the object_name with the encoding chosen for db connection/dump.<br/> 2. it invokes write_msg function or alike:<br /> write_msg(NULL, "finding the columns and typesof table \"%s\"\n", tbinfo->dobj.name);<br /> 3. vwrite_msg localizes text message, but not the argument(s):<br /> vfprintf(stderr, _(fmt), ap);<br /> Here gettext (_) internally translates fmt to OS encoding (if it's different fromUTF-8 - encoding of a localized strings).<br /><br /> And I can see only a few solutions of the problem:<br /> 1. Toconvert the object name at the back-end, i.e. to modify all the similar SELECT's as:<br /> 'SELECT c.tableoid, c.oid, c.relname,convert_to(c.relname, 'OS_ENCODING') AS locrelname, ...' <br /> and then do write_msg(NULL, "finding the columnsand types of table \"%s\"\n", tbinfo->dobj.local_name);<br /> The downside of this approach is that it requiresrewriting all the SELECT's for all the object. And it doesn't help us to write out any other text from backend, suchas localized backend error.<br /><br /> 2. To setup another connection to backend with the OS encoding, and to get allthe object names through it. It looks insane too. And we have the same problem with the localized backend errors comingon "main" connection.<br /><br /> 3. To make convert_to_os_encoding(text, encoding) function for a frontend utilities.Unfortunately frontend can't use internal PostgreSQL conversion functions, and modifying them to use through libpqlooks unfeasible.<br /> So the only way to implement such function is to use another encoding conversion framework (library).<br/> And my question is - is it possible to include libiconv (add this dependency) to the frontend utilities code?<br/><br /> 4. To force users to use OS encoding as the Database encoding. Or to not use non-ASCII characters in andb object names and to disable nls on Windows completely. It doesn't look like a solution at all.<br /><br /> BTW, it'snot the only one instance of the issue. For example, when I try to use vacuumdb, I get completely unreadable messages:<br/><a class="moz-txt-link-freetext" href="http://oi48.tinypic.com/1c8j9.jpg">http://oi48.tinypic.com/1c8j9.jpg</a><br/> (blue marks what is in Russian or English,all the other text is gibberish).<br /><br /> Best regards,<br /> Alexander<br /><br /><br /> 18.07.2012 12:51, AlexanderLaw wrote:<br /></div><blockquote cite="mid:50067916.8010508@gmail.com" type="cite"> Hello,<br /><br /> The dumpfile itself is correct. The issue is only with the non-ASCII object names in pg_dump messages.<br /> The messages text(which is non-ASCII too) displayed consistently with right encoding (i.e. with OS encoding thanks to libintl/gettext),but encoding of db object names depends on the dump encoding and thus they're getting unreadable when differentencoding is used.<br /> The same can be reproduced in Linux (where console encoding is UTF-8) when doing dump withWindows-1251 or Latin1 (for western european languages).<br /><br /> Thanks,<br /> Alexander<br /><br /><br /><blockquotestyle="border-left: #5555EE solid 0.2em; margin: 0em; padding-left: 0.85em"><pre style="margin: 0em;">Thefollowing bug has been logged on the website: Bug reference: 6742 Logged by: Alexander LAW Email address: exclusion(at)gmail(dot)com PostgreSQL version: 9.1.4 Operating system: Windows Description: When I try to dump database with UTF-8 encoding in Windows, I get unreadable object names. Please look at the screenshot (<a href="http://oi50.tinypic.com/2lw6ipf.jpg" moz-do-not-send="true" rel="nofollow">http://oi50.tinypic.com/2lw6ipf.jpg</a>).On the left window all the pg_dump messages displayed correctly (except for the prompt password (bug #6510)), but the non-ASCII object name is gibberish. On the right window (where dump is done with the Windows 1251 encoding (OS Encoding for Russian locale)) everything is right. </pre></blockquote><pre style="margin: 0em;">Did you check the dump file using an editor that can handle UTF-8? The Windows console is not known for properly handling that encoding. Thomas </pre></blockquote><br />
On Wed, Jul 25, 2012 at 7:54 AM, Alexander Law <exclusion@gmail.com> wrote: > Hello, > I would like to fix this bug, but it looks like it would be not one-line > patch. > Looking at the pg_dump code I see that the object names come through the > following chain: > 1. pg_dump executes 'SELECT c.tableoid, c.oid, c.relname, ... ' and gets the > object_name with the encoding chosen for db connection/dump. > 2. it invokes write_msg function or alike: > > write_msg(NULL, "finding the columns and types of table \"%s\"\n", > tbinfo->dobj.name); > 3. vwrite_msg localizes text message, but not the argument(s): > vfprintf(stderr, _(fmt), ap); > Here gettext (_) internally translates fmt to OS encoding (if it's different > from UTF-8 - encoding of a localized strings). > > And I can see only a few solutions of the problem: > 1. To convert the object name at the back-end, i.e. to modify all the > similar SELECT's as: > 'SELECT c.tableoid, c.oid, c.relname, convert_to(c.relname, 'OS_ENCODING') > AS locrelname, ...' > and then do write_msg(NULL, "finding the columns and types of table > \"%s\"\n", tbinfo->dobj.local_name); > The downside of this approach is that it requires rewriting all the SELECT's > for all the object. And it doesn't help us to write out any other text from > backend, such as localized backend error. > > 2. To setup another connection to backend with the OS encoding, and to get > all the object names through it. It looks insane too. And we have the same > problem with the localized backend errors coming on "main" connection. > > 3. To make convert_to_os_encoding(text, encoding) function for a frontend > utilities. Unfortunately frontend can't use internal PostgreSQL conversion > functions, and modifying them to use through libpq looks unfeasible. > So the only way to implement such function is to use another encoding > conversion framework (library). > And my question is - is it possible to include libiconv (add this > dependency) to the frontend utilities code? > > 4. To force users to use OS encoding as the Database encoding. Or to not use > non-ASCII characters in an db object names and to disable nls on Windows > completely. It doesn't look like a solution at all. I think if you're going to try to do something about this, #1 is probably the best option. It does sound like a lot of work, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company