Обсуждение: Re: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore
> We have a Unicode (UTF-8) database that we are trying to upgrade to 7.1b4. > We did a pg_dumpall (yes, using the old version) and then tried a restore. > We hit the following 3 problems: > > 1. Some of the text is large, about 20k characters, and is multiline. For > almost all of the lines this was fine (postgres put a \ at the end of the > previos line) but for some it was not. The lines I looked at all had > non-English characters (Japanese and/or Korean) at the end of the line. When > the restore encountered these lines it failed and, since the dump uses COPY, > the entire table was left blank. > > 2. Some two-byte dash/hyphen characters DID get correctly imported into the > database but could not be read out again via JDBC, that is, when read the > record was truncated at the character. This _might_ be related to a long > standing Java core bug regarding improper conversions between certain > languages and the internal Unicode representation for hyphens. > > 3. One other character, a two-byte apostrophe, was not restoreable, > similarly to the hyphen problem. > > > After fighting the above, I decided to try doing the dump with the -dn > flags. This fixed problem #1 but not 2 or 3. If needed I can try to get > details about the problem characters. This might be related to a known bug with 7.0.x. Can you grab a patch from ftp://ftp.sra.co.jp/pub/cmd/postgres/7.0.3/patches/copy.patch.gz and try again? Or even better, can you give me a minimum set of data that reproduces your problem? -- Tatsuo Ishii
Well, I tried the patch and the newly produced dump was identical to the bad dump from before, so the patch had no affect. I will try to trim it down to a reasonably small file and email it to you. --Rainer > -----Original Message----- > From: pgsql-bugs-owner@postgresql.org > [mailto:pgsql-bugs-owner@postgresql.org]On Behalf Of Tatsuo Ishii > Sent: Friday, February 23, 2001 10:32 AM > To: rmager@vgkk.com > Cc: pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org > Subject: Re: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore > This might be related to a known bug with 7.0.x. Can you grab a patch > from ftp://ftp.sra.co.jp/pub/cmd/postgres/7.0.3/patches/copy.patch.gz > and try again? > > Or even better, can you give me a minimum set of data that reproduces > your problem? > -- > Tatsuo Ishii
> Attached is a single INSERT that shows the problem. The character after the > word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the > database, that is, the dump/restore seems ok, the problem is when trying to > read the text later. The database is UTF8 and I just tested with beta 5. > > Oh, BTW, if I try to set (INSERT) this same character via JDBC and then > retreive it again then everything is fine. Thanks. I'll dig into it. -- Tatsuo Ishii
> Attached is a single INSERT that shows the problem. The character after the > word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the > database, that is, the dump/restore seems ok, the problem is when trying to > read the text later. The database is UTF8 and I just tested with beta 5. > > Oh, BTW, if I try to set (INSERT) this same character via JDBC and then > retreive it again then everything is fine. I have tested your data using psql: unicode=# create table pr_prop_info(i1 int, i2 int, i3 int, t text); CREATE unicode=# \encoding LATIN1 unicode=# \i example.sql INSERT 2378114 1 unicode=# select * from pr_prop_info; The character after the word "Fiber" looks like "Optic Cable". So as long as the server/client encoding set correctly, it looks ok. I guess we have some problems with JDBC driver. Unfortunately I am not a Java guru at all. Can anyone look into our JDBC driver regarding this problem? -- Tatsuo Ishii
Attached is a single INSERT that shows the problem. The character after the word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the database, that is, the dump/restore seems ok, the problem is when trying to read the text later. The database is UTF8 and I just tested with beta 5. Oh, BTW, if I try to set (INSERT) this same character via JDBC and then retreive it again then everything is fine. --Rainer > -----Original Message----- > From: pgsql-bugs-owner@postgresql.org > [mailto:pgsql-bugs-owner@postgresql.org]On Behalf Of Tatsuo Ishii > Sent: Friday, February 23, 2001 10:32 AM > > Or even better, can you give me a minimum set of data that reproduces > your problem? > -- > Tatsuo Ishii
Вложения
I'm trying to run the latest CVS code's regression tests and have a problem. They fail at initdb with this: Running with noclean mode on. Mistakes will not be cleaned up. /opt/home/rmager/devel/External/pgsql/src/test/regress/./tmp_check/install// usr/local/pgsql/bin/pg_encoding: erro r while loading shared libraries: /opt/home/rmager/devel/External/pgsql/src/test/regress/./tmp_check/install// usr /local/pgsql/bin/pg_encoding: undefined symbol: pg_char_to_encoding initdb: pg_encoding failed Perhaps you did not configure PostgreSQL for multibyte support or the program was not successfully installed. I ran configure with this: ./configure --enable-multibyte --enable-syslog --with-java Any ideas? --Rainer
Hi all, We're using PG 7.0 and 7.1beta and are having dead lock problems. The docs say the Postgres detects dead locks and automatically rolls back 1 transaction to recover but this is not our experience. Are the docs incorrect or is this more serious? Thanks, --Rainer
"Rainer Mager" <rmager@vgkk.com> writes: > We're using PG 7.0 and 7.1beta and are having dead lock problems. The docs > say the Postgres detects dead locks and automatically rolls back 1 > transaction to recover but this is not our experience. Are the docs > incorrect or is this more serious? Which beta release? There are some known undetected-deadlock cases in 7.0, which were repaired in late January --- that would have been beta4 or possibly beta5, I forget now. If you still see this behavior with 7.1RC1 then I would like details. regards, tom lane
I just tested a bug I originally fount in 7.1b4 with the new 7.1RC3 and it still exists. I would consider this a major bug because I know of no work around. Basically what happens is that a dump of an existing Unicode database (from 7.03) has a double-byte hyphen character that becomes \255 in the dump. When the data is imported into the new 7.1 database it seems to correctly appear (verified via psql) BUT when reading this record via JDBC the data is truncated at this character. I communicated briefly with Ishii-san regarding this a while back but I never followed up. Considering RC3 is now out I thought I should revisit the issue. It should be easy to test by editing and postgres Unicode database dump and putting \255 somewhere in a string. I'm not sure if it matters but the dump was done with "-dn" flags. Thanks, --Rainer > -----Original Message----- > From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] > Sent: Wednesday, February 28, 2001 11:02 AM > To: rmager@vgkk.com > Cc: pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org > Subject: RE: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore > > > > Attached is a single INSERT that shows the problem. The > character after the > > word "Fiber" truncates the text when using JDBC. NOTE, the text > IS in the > > database, that is, the dump/restore seems ok, the problem is > when trying to > > read the text later. The database is UTF8 and I just tested with beta 5. > > > > Oh, BTW, if I try to set (INSERT) this same character via JDBC and then > > retreive it again then everything is fine. > > I have tested your data using psql: > > unicode=# create table pr_prop_info(i1 int, i2 int, i3 int, t text); > CREATE > unicode=# \encoding LATIN1 > unicode=# \i example.sql > INSERT 2378114 1 > unicode=# select * from pr_prop_info; > > The character after the word "Fiber" looks like "Optic Cable". So as > long as the server/client encoding set correctly, it looks ok. I guess > we have some problems with JDBC driver. Unfortunately I am not a Java > guru at all. Can anyone look into our JDBC driver regarding this > problem? > -- > Tatsuo Ishii
I noticed that 7.1 has officially been released. Does anyone know the status of the bug I reported regarding encoding problems when dumping a 7.0 db an restoring on 7.1? Thanks, --Rainer
Hi, I'm trying to see if I can patch this bug myself because we are under some time constraints. Can anyone give me a tip regarding where in the postgres source the internal UTF-8 code is converted during a dump? I believe that the character 0xAD is a ASCII character that looks like a dash. According to the UTF-8 spec, anything over 0x7F requires another byte with it (which, I think, means that you should never see the 0xAD character by itself in a postgres dump, but I am seeing this). So, I'm guessing that some piece of the UTF-8 conversion routine is a bit off. Any tips on where to start? I would try to hack a fix by searching for the offending character in the dump and replacing it with a normal dash but unfortunately 0xAD is a valid byte when paired with other bytes and these also exist in our dump. --Rainer > -----Original Message----- > From: pgsql-bugs-owner@postgresql.org > [mailto:pgsql-bugs-owner@postgresql.org]On Behalf Of Rainer Mager > Sent: Monday, April 16, 2001 12:15 PM > To: pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org > Subject: RE: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore > > > I noticed that 7.1 has officially been released. Does anyone know > the status > of the bug I reported regarding encoding problems when dumping a 7.0 db an > restoring on 7.1? > > > Thanks, > > --Rainer > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)