Обсуждение: again: Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork
Hello, I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. Now I upgraded to 7.3.3 and I'm not happy with this. The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5: Copy to table (DB has UTF-8 encoding) from file: for PGCLIENTENCODING=BIG5: WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored for EUC_TW WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored Copy out to file from table (UTF-8 data): to BIG5 WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored to EUC_TW is ok! Regards, Michael
> Hello, > I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. > Now I upgraded to 7.3.3 and I'm not happy with this. > The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5: > > Copy to table (DB has UTF-8 encoding) from file: > for PGCLIENTENCODING=BIG5: > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored I see no problem here. The only standard conversion map I could found on-line form so far (see below URL) does not include entries 0xf9d6 or above. http://www.unicode.org/Public/UNIDATA/Unihan.txt > for EUC_TW > WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL supports only: CNS 11643-1993, plane 0 CNS 11643-1993, plane 1 CNS 11643-1993, plane 2 CNS 11643-1993, plane 15 Would you like to have support for rest of CNS 11643-1993 planes: CNS 11643-1993, plane 3 CNS 11643-1993, plane 4 CNS 11643-1993, plane 5 CNS 11643-1993, plane 6 CNS 11643-1993, plane 7 support for upcoming 7.4? > Copy out to file from table (UTF-8 data): > to BIG5 > WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored > WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored > WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored > WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored > > to EUC_TW is ok! BIG5 and EUC_TW have different code points. So this is not very strange. -- Tatsuo Ishii
> > > Copy to table (DB has UTF-8 encoding) from file: > > > for PGCLIENTENCODING=BIG5: > > > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored > > > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored > > > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored > > > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored > > > > I see no problem here. The only standard conversion map I could found > > on-line form so far (see below URL) does not include entries 0xf9d6 or > > above. > > Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes. > I only got a file in BIG5 encoding from Taiwan and found that it is not possible > to load all text to postgresql 7.3.3. > But it is possible to convert to UTF-8 with iconv tool from glibc (Linux). > It would be good if next release supports todays BIG5. I'm not looking forward to add any conversion entries confirmed by standards. Can some one explain me the current status of the conversion maps between BIG5 and Unicode? The only info I could found so far is in www.unicode.org. -- Tatsuo Ishii
> > > > Copy to table (DB has UTF-8 encoding) from file: > > > > for PGCLIENTENCODING=BIG5: > > > > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored > > > > > > I see no problem here. The only standard conversion map I could found > > > on-line form so far (see below URL) does not include entries 0xf9d6 or > > > above. > > > > Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes. > > I only got a file in BIG5 encoding from Taiwan and found that it is not possible > > to load all text to postgresql 7.3.3. > > But it is possible to convert to UTF-8 with iconv tool from glibc (Linux). > > It would be good if next release supports todays BIG5. > > I'm not looking forward to add any conversion entries confirmed by > standards. Can some one explain me the current status of the Oops. above should be: I'm not looking forward to add any conversion entries NOT confirmed by standards. > conversion maps between BIG5 and Unicode? The only info I could found > so far is in www.unicode.org. > -- > Tatsuo Ishii > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
> > > I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. > > > Now I upgraded to 7.3.3 and I'm not happy with this. > > > The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5: > > > > > > Copy to table (DB has UTF-8 encoding) from file: > > > for PGCLIENTENCODING=BIG5: > > > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored > > > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored > > > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored > > > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored > > > > I see no problem here. The only standard conversion map I could found > > on-line form so far (see below URL) does not include entries 0xf9d6 or > > above. > > > > http://www.unicode.org/Public/UNIDATA/Unihan.txt > > > I found in this file: > U+F9D7 in line 604519 > U+F9D8 in line 219540 > U+F9D6...U+F9DB in lines 730707...730766. No. U+F9D6 means *Unicode* code point, not BIG5 code point. > > > > for EUC_TW > > > WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored > > > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored > > > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored > > > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored > > > > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL > > supports only: > > > > CNS 11643-1993, plane 0 > > CNS 11643-1993, plane 1 > > CNS 11643-1993, plane 2 > > CNS 11643-1993, plane 15 > > > > Would you like to have support for rest of CNS 11643-1993 planes: > > > > CNS 11643-1993, plane 3 > > CNS 11643-1993, plane 4 > > CNS 11643-1993, plane 5 > > CNS 11643-1993, plane 6 > > CNS 11643-1993, plane 7 > > > > support for upcoming 7.4? > > > > > Copy out to file from table (UTF-8 data): > > > to BIG5 > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored > > > > > > to EUC_TW is ok! > > > > BIG5 and EUC_TW have different code points. So this is not very strange. > > > But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error. > > Michael >
Tatsuo Ishii wrote: > > > Hello, > > I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. > > Now I upgraded to 7.3.3 and I'm not happy with this. > > The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5: > > > > Copy to table (DB has UTF-8 encoding) from file: > > for PGCLIENTENCODING=BIG5: > > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored > > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored > > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored > > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored > > I see no problem here. The only standard conversion map I could found > on-line form so far (see below URL) does not include entries 0xf9d6 or > above. Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes. I only got a file in BIG5 encoding from Taiwan and found that it is not possible to load all text to postgresql 7.3.3. But it is possible to convert to UTF-8 with iconv tool from glibc (Linux). It would be good if next release supports todays BIG5. Michael > http://www.unicode.org/Public/UNIDATA/Unihan.txt > > > for EUC_TW > > WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored > > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored > > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored > > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored > > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL > supports only: > > CNS 11643-1993, plane 0 > CNS 11643-1993, plane 1 > CNS 11643-1993, plane 2 > CNS 11643-1993, plane 15 > > Would you like to have support for rest of CNS 11643-1993 planes: > > CNS 11643-1993, plane 3 > CNS 11643-1993, plane 4 > CNS 11643-1993, plane 5 > CNS 11643-1993, plane 6 > CNS 11643-1993, plane 7 > > support for upcoming 7.4? > > > Copy out to file from table (UTF-8 data): > > to BIG5 > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored > > WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored > > WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored > > > > to EUC_TW is ok! > > BIG5 and EUC_TW have different code points. So this is not very strange. > -- > Tatsuo Ishii
Tatsuo Ishii wrote: > > > > > I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. > > > > Now I upgraded to 7.3.3 and I'm not happy with this. > > > > The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5: > > > > > > > > Copy to table (DB has UTF-8 encoding) from file: > > > > for PGCLIENTENCODING=BIG5: > > > > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored > > > > > > I see no problem here. The only standard conversion map I could found > > > on-line form so far (see below URL) does not include entries 0xf9d6 or > > > above. > > > > > > http://www.unicode.org/Public/UNIDATA/Unihan.txt > > > > > > I found in this file: > > U+F9D7 in line 604519 > > U+F9D8 in line 219540 > > U+F9D6...U+F9DB in lines 730707...730766. > > No. U+F9D6 means *Unicode* code point, not BIG5 code point. Ok. I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz: % Chinese charmap for BIG5 (CP950) % version: 0.92 % Contact: Tung-Han Hsieh <thhsieh@linux.org.tw> % Yuan-Chung Cheng <platin@ms31.hinet.net> % Distribution and use is free, even for comercial purpose. % % This charmap is converted from: % ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT % ... There "my" characters are in. Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error but I can not copy "from" file without error? Michael > > > > > > for EUC_TW > > > > WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored > > > > > > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL > > > supports only: > > > > > > CNS 11643-1993, plane 0 > > > CNS 11643-1993, plane 1 > > > CNS 11643-1993, plane 2 > > > CNS 11643-1993, plane 15 > > > > > > Would you like to have support for rest of CNS 11643-1993 planes: > > > > > > CNS 11643-1993, plane 3 > > > CNS 11643-1993, plane 4 > > > CNS 11643-1993, plane 5 > > > CNS 11643-1993, plane 6 > > > CNS 11643-1993, plane 7 > > > > > > support for upcoming 7.4? > > > > > > > Copy out to file from table (UTF-8 data): > > > > to BIG5 > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored > > > > > > > > to EUC_TW is ok! > > > > > > BIG5 and EUC_TW have different code points. So this is not very strange. > > > > > > But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error. > > > > Michael > >
> I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz: > % Chinese charmap for BIG5 (CP950) > % version: 0.92 > % Contact: Tung-Han Hsieh <thhsieh@linux.org.tw> > % Yuan-Chung Cheng <platin@ms31.hinet.net> > % Distribution and use is free, even for comercial purpose. > % > % This charmap is converted from: > % ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT > % ... > > There "my" characters are in. That's a M$'s definition, not a standard. I think there should be a reason why the Unicode org. does not use it. > Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error > but I can not copy "from" file without error? I'm not quite sure what you are saying. Are you complaining that (for example) 0xe7a281 in UTF-8 does not convert to EUC_TW? BTW, what do you think about below? FYI, CNS 11643-1993 is the standard character set and EUC_TW is the one of the encodings. That means your problem below will disappear. > > > > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored > > > > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL > > > > supports only: > > > > > > > > CNS 11643-1993, plane 0 > > > > CNS 11643-1993, plane 1 > > > > CNS 11643-1993, plane 2 > > > > CNS 11643-1993, plane 15 > > > > > > > > Would you like to have support for rest of CNS 11643-1993 planes: > > > > > > > > CNS 11643-1993, plane 3 > > > > CNS 11643-1993, plane 4 > > > > CNS 11643-1993, plane 5 > > > > CNS 11643-1993, plane 6 > > > > CNS 11643-1993, plane 7 > > > > > > > > support for upcoming 7.4? -- Tatsuo Ishii
Tatsuo Ishii wrote: > > > I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz: > > % Chinese charmap for BIG5 (CP950) > > % version: 0.92 > > % Contact: Tung-Han Hsieh <thhsieh@linux.org.tw> > > % Yuan-Chung Cheng <platin@ms31.hinet.net> > > % Distribution and use is free, even for comercial purpose. > > % > > % This charmap is converted from: > > % ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT > > % ... > > > > There "my" characters are in. > > That's a M$'s definition, not a standard. I think there should be a > reason why the Unicode org. does not use it. Ok, I do not know the reason. But since also the glibc uses it, couldn't you use it too? I believe the glibc delveloper have thought about this a lot. And they came to the conclusion to use this definition. Why not postgresql? > > Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error > > but I can not copy "from" file without error? > > I'm not quite sure what you are saying. Are you complaining that (for > example) 0xe7a281 in UTF-8 does not convert to EUC_TW? Yes exactly, since this value comes from a "copy to" with PGCLIENTENCODING=EUC_TW > > BTW, what do you think about below? > > FYI, CNS 11643-1993 is the standard character set and EUC_TW is the > one of the encodings. That means your problem below will disappear. Ok. Regards, Michael > > > > > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored > > > > > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored > > > > > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored > > > > > > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL > > > > > supports only: > > > > > > > > > > CNS 11643-1993, plane 0 > > > > > CNS 11643-1993, plane 1 > > > > > CNS 11643-1993, plane 2 > > > > > CNS 11643-1993, plane 15 > > > > > > > > > > Would you like to have support for rest of CNS 11643-1993 planes: > > > > > > > > > > CNS 11643-1993, plane 3 > > > > > CNS 11643-1993, plane 4 > > > > > CNS 11643-1993, plane 5 > > > > > CNS 11643-1993, plane 6 > > > > > CNS 11643-1993, plane 7 > > > > > > > > > > support for upcoming 7.4? > -- > Tatsuo Ishii