Обсуждение: utf8 errors

Поиск
Список
Период
Сортировка

utf8 errors

От
Jiří Pavlovský
Дата:
Hello,

I have a win32 application. It uses gtk for GUI and postgres. Recently I
upgraded to newer gtk and postgres 9.2. I'm now getting utf8 errors from
postgres.
The thing I don't understand that the queries, which postgres complains
about, seem to be perfectly valid.

For example
 LOG:  statement: INSERT INTO recipients (DealID,
Contactid)                               VALUES (29009, 9387)
 ERROR:  invalid byte sequence for encoding "UTF8": 0x9c


But the query is clean ascii and it doesn't even contain the mentioned
character.

My database is in UNICODE, client encoding is utf8.

 So I'm stuck not knowing where to look for a problem.

Thank you,
Jiri

--
Jiří Pavlovský



Re: utf8 errors

От
Adrian Klaver
Дата:
On 06/25/2013 12:18 PM, Jiří Pavlovský wrote:
> Hello,
>
> I have a win32 application. It uses gtk for GUI and postgres. Recently I
> upgraded to newer gtk and postgres 9.2. I'm now getting utf8 errors from
> postgres.
> The thing I don't understand that the queries, which postgres complains
> about, seem to be perfectly valid.
>
> For example
>   LOG:  statement: INSERT INTO recipients (DealID,
> Contactid)                               VALUES (29009, 9387)
>   ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>
>
> But the query is clean ascii and it doesn't even contain the mentioned
> character.
>
> My database is in UNICODE, client encoding is utf8.

At a guess your client encoding is actually some form of WINXXXX, most
likely WIN1252.

>
>   So I'm stuck not knowing where to look for a problem.
>
> Thank you,
> Jiri
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: utf8 errors

От
Pavel Stehule
Дата:
Hello

in this mailing list is not high traffic,

please try to ask on postgresql general mailing list

http://www.postgresql.org/list/pgsql-general/

or

Czech google groups https://groups.google.com/forum/?hl=cs#!forum/postgresql-cz

2013/6/25 Jiří Pavlovský <jiri@pavlovsky.eu>:
> Hello,
>
> I have a win32 application. It uses gtk for GUI and postgres. Recently I
> upgraded to newer gtk and postgres 9.2. I'm now getting utf8 errors from
> postgres.
> The thing I don't understand that the queries, which postgres complains
> about, seem to be perfectly valid.
>
> For example
>  LOG:  statement: INSERT INTO recipients (DealID,
> Contactid)                               VALUES (29009, 9387)
>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>

This message is strange - I expect so *ID columns are numeric. Can you
show a table definition?

Regards

Pavel Stehule

>
> But the query is clean ascii and it doesn't even contain the mentioned
> character.
>
> My database is in UNICODE, client encoding is utf8.
>
>  So I'm stuck not knowing where to look for a problem.
>
> Thank you,
> Jiri
>
> --
> Jiří Pavlovský
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: utf8 errors

От
Jiří Pavlovský
Дата:
On 25.6.2013 21:39, Pavel Stehule wrote:
> Hello
>
> in this mailing list is not high traffic,
>
> please try to ask on postgresql general mailing list
>
> http://www.postgresql.org/list/pgsql-general/
>
> or
>
> Czech google groups https://groups.google.com/forum/?hl=cs#!forum/postgresql-cz
>
> 2013/6/25 Jiří Pavlovský <jiri@pavlovsky.eu>:
>> Hello,
>>
>> I have a win32 application. It uses gtk for GUI and postgres. Recently I
>> upgraded to newer gtk and postgres 9.2. I'm now getting utf8 errors from
>> postgres.
>> The thing I don't understand that the queries, which postgres complains
>> about, seem to be perfectly valid.
>>
>> For example
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
> This message is strange - I expect so *ID columns are numeric. Can you
> show a table definition?

Yes, sure:

CREATE TABLE recipients
(
  contactid integer,
  dealid integer,
  CONSTRAINT "$1" FOREIGN KEY (contactid)
      REFERENCES contacts (contactid) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT recipients_dealid_fk FOREIGN KEY (dealid)
      REFERENCES subscription (dealid) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT recipients_dealid_key UNIQUE (dealid, contactid)
)

--
Jiří Pavlovský



Re: utf8 errors

От
Pavel Stehule
Дата:
2013/6/25 Jiří Pavlovský <jiri@pavlovsky.eu>:
> On 25.6.2013 21:39, Pavel Stehule wrote:
>> Hello
>>
>> in this mailing list is not high traffic,
>>
>> please try to ask on postgresql general mailing list
>>
>> http://www.postgresql.org/list/pgsql-general/
>>
>> or
>>
>> Czech google groups https://groups.google.com/forum/?hl=cs#!forum/postgresql-cz
>>
>> 2013/6/25 Jiří Pavlovský <jiri@pavlovsky.eu>:
>>> Hello,
>>>
>>> I have a win32 application. It uses gtk for GUI and postgres. Recently I
>>> upgraded to newer gtk and postgres 9.2. I'm now getting utf8 errors from
>>> postgres.
>>> The thing I don't understand that the queries, which postgres complains
>>> about, seem to be perfectly valid.
>>>
>>> For example
>>>  LOG:  statement: INSERT INTO recipients (DealID,
>>> Contactid)                               VALUES (29009, 9387)
>>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>>
>> This message is strange - I expect so *ID columns are numeric. Can you
>> show a table definition?
>
> Yes, sure:
>
> CREATE TABLE recipients
> (
>   contactid integer,
>   dealid integer,
>   CONSTRAINT "$1" FOREIGN KEY (contactid)
>       REFERENCES contacts (contactid) MATCH SIMPLE
>       ON UPDATE CASCADE ON DELETE CASCADE,
>   CONSTRAINT recipients_dealid_fk FOREIGN KEY (dealid)
>       REFERENCES subscription (dealid) MATCH SIMPLE
>       ON UPDATE CASCADE ON DELETE CASCADE,
>   CONSTRAINT recipients_dealid_key UNIQUE (dealid, contactid)
> )
>

There is some wrong - pg can raise this message when detect some error
in conversion between client encoding and server encoding. But you
speaking so your client encoding and database encoding is same, and
this situation should not be possible. So you should to recheck your
client encoding and real input.

In old versions Postgres don't check UTF well, so it was possible to
store some strange chars, but it is not possible now.

Pavel Stehule

> --
> Jiří Pavlovský
>


Re: utf8 errors

От
Albe Laurenz
Дата:
Jirí Pavlovský wrote:
> I have a win32 application.

>  LOG:  statement: INSERT INTO recipients (DealID,
> Contactid)                               VALUES (29009, 9387)
>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
> 
> 
> But the query is clean ascii and it doesn't even contain the mentioned
> character.
> 
> My database is in UNICODE, client encoding is utf8.

Could you run the log message through "od -c" on a UNIX
machine and post the result?  Maybe there are some weird
invisible bytes in there.

Yours,
Laurenz Albe

Re: utf8 errors

От
Jiří Pavlovský
Дата:
On 26.6.2013 10:58, Albe Laurenz wrote:
Jirí Pavlovský wrote:
I have a win32 application.LOG:  statement: INSERT INTO recipients (DealID,
Contactid)                               VALUES (29009, 9387)ERROR:  invalid byte sequence for encoding "UTF8": 0x9c


But the query is clean ascii and it doesn't even contain the mentioned
character.

My database is in UNICODE, client encoding is utf8.
Could you run the log message through "od -c" on a UNIX
machine and post the result?  Maybe there are some weird
invisible bytes in there.


Hi,

I've already tried that before posting. See below for results. Is the
message in the log the same as the message that postgres receives?


0000000   I   N   S   E   R   T       I   N   T   O       r   e   c   i
0000020   p   i   e   n   t   s       (   D   e   a   l   I   D   ,
0000040   C   o   n   t   a   c   t   i   d   )
0000060
0000100                                       V   A   L   U   E   S
0000120   (   2   9   0   0   9   ,       9   3   8   7   )  \n
0000136


Re: utf8 errors

От
Jiří Pavlovský
Дата:
On 26.6.2013 10:58, Albe Laurenz wrote:
> Jirí Pavlovský wrote:
>> I have a win32 application.
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
>>
>> But the query is clean ascii and it doesn't even contain the mentioned
>> character.
>>
>> My database is in UNICODE, client encoding is utf8.
>

This is also strange:

 LOG:  statement: INSERT INTO recipients (DealID,
Contactid)                               VALUES (29010, 14340)
 ERROR:  invalid byte sequence for encoding "UTF8": 0xf2 0x2f 0x04 0xa0

next query only with the first value incremented produces different error

 LOG:  statement: INSERT INTO recipients (DealID,
Contactid)                               VALUES (29011, 14340)
 ERROR:  invalid byte sequence for encoding "UTF8": 0x88




Re: utf8 errors

От
Alban Hertroys
Дата:
On 26 June 2013 11:17, Jiří Pavlovský <jiri@pavlovsky.eu> wrote:
On 26.6.2013 10:58, Albe Laurenz wrote:
> Jirí Pavlovský wrote:
>> I have a win32 application.
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
>>
>> But the query is clean ascii and it doesn't even contain the mentioned
>> character.

Can you show a \d+ of the recipients table? I suspect there is a trigger attached to inserts on the table or some other side-effect that's causing the issue.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: utf8 errors

От
Jiří Pavlovský
Дата:
On 26.6.2013 12:19, Alban Hertroys wrote:
On 26 June 2013 11:17, Jiří Pavlovský <jiri@pavlovsky.eu> wrote:
On 26.6.2013 10:58, Albe Laurenz wrote:
> Jirí Pavlovský wrote:
>> I have a win32 application.
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
>>
>> But the query is clean ascii and it doesn't even contain the mentioned
>> character.

Can you show a \d+ of the recipients table? I suspect there is a trigger attached to inserts on the table or some other side-effect that's causing the issue.


Here you go. But I don't think that is the cause. I'm getting these errors on tables as well. Actually when I copy and paste the offending queries from log into pgAdmin it runs without an error.

                       Table "public.recipients"
  Column   |  Type   | Modifiers | Storage | Stats target | Description
-----------+---------+-----------+---------+--------------+-------------
 contactid | integer |           | plain   |              |
 dealid    | integer |           | plain   |              |
Indexes:
    "recipients_dealid_key" UNIQUE CONSTRAINT, btree (dealid, contactid)
    "fki_recipients_contactid" btree (contactid)
Foreign-key constraints:
    "$1" FOREIGN KEY (contactid) REFERENCES contacts(contactid) ON UPDATE CASCADE ON DELETE CASCADE
    "recipients_dealid_fk" FOREIGN KEY (dealid) REFERENCES subscription(dealid) ON UPDATE CASCADE ON DELETE CASCADE
Has OIDs: yes

Re: utf8 errors

От
Alban Hertroys
Дата:
On 26 June 2013 12:39, Jiří Pavlovský <jiri@pavlovsky.eu> wrote:
On 26.6.2013 12:19, Alban Hertroys wrote:
On 26 June 2013 11:17, Jiří Pavlovský <jiri@pavlovsky.eu> wrote:
On 26.6.2013 10:58, Albe Laurenz wrote:
> Jirí Pavlovský wrote:
>> I have a win32 application.
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
>>
>> But the query is clean ascii and it doesn't even contain the mentioned
>> character.

Can you show a \d+ of the recipients table? I suspect there is a trigger attached to inserts on the table or some other side-effect that's causing the issue.


Here you go. But I don't think that is the cause. I'm getting these errors on tables as well. Actually when I copy and paste the offending queries from log into pgAdmin it runs without an error.

I suppose that contacts.contactid and subscription.dealid are integers as well and not, for example, text fields?
 
So the queries work from pgadmin; what application/environment are they NOT working in? Something is obviously different. You say it's a Win32 application, what database libraries and programming languages are involved?

Does the application perhaps send trailing garbage after the query or something similar? Something like that might happen if there's a memory allocation bug in the application.
I'm assuming here that, if the query string cannot be converted from utf-8 due to garbage characters, the transcoding error triggers before the query parser notices a syntax error.


                       Table "public.recipients"
  Column   |  Type   | Modifiers | Storage | Stats target | Description
-----------+---------+-----------+---------+--------------+-------------
 contactid | integer |           | plain   |              |
 dealid    | integer |           | plain   |              |
Indexes:
    "recipients_dealid_key" UNIQUE CONSTRAINT, btree (dealid, contactid)
    "fki_recipients_contactid" btree (contactid)
Foreign-key constraints:
    "$1" FOREIGN KEY (contactid) REFERENCES contacts(contactid) ON UPDATE CASCADE ON DELETE CASCADE
    "recipients_dealid_fk" FOREIGN KEY (dealid) REFERENCES subscription(dealid) ON UPDATE CASCADE ON DELETE CASCADE
Has OIDs: yes




--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: utf8 errors

От
Jiří Pavlovský
Дата:
On 26.6.2013 13:32, Alban Hertroys wrote:
On 26 June 2013 12:39, Jiří Pavlovský <jiri@pavlovsky.eu> wrote:
On 26.6.2013 12:19, Alban Hertroys wrote:
On 26 June 2013 11:17, Jiří Pavlovský <jiri@pavlovsky.eu> wrote:
On 26.6.2013 10:58, Albe Laurenz wrote:
> Jirí Pavlovský wrote:
>> I have a win32 application.
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
>>
>> But the query is clean ascii and it doesn't even contain the mentioned
>> character.

Can you show a \d+ of the recipients table? I suspect there is a trigger attached to inserts on the table or some other side-effect that's causing the issue.


Here you go. But I don't think that is the cause. I'm getting these errors on tables as well. Actually when I copy and paste the offending queries from log into pgAdmin it runs without an error.

I suppose that contacts.contactid and subscription.dealid are integers as well and not, for example, text fields?
Yes integers.
 
So the queries work from pgadmin; what application/environment are they NOT working in? Something is obviously different. You say it's a Win32 application, what database libraries and programming languages are involved?
I'm using  plain c and libpq from 9.2.2. And gtk as a GUI. Compiler is mingw (gcc for windows).

Does the application perhaps send trailing garbage after the query or something similar? Something like that might happen if there's a memory allocation bug in the application.
I'm assuming here that, if the query string cannot be converted from utf-8 due to garbage characters, the transcoding error triggers before the query parser notices a syntax error.

Could be. But when I look at the query string in gdb, before it is send, I don't see there anything problematic.
I guess I'll have to try to wite some test cases to try to locate the problem.

Re: utf8 errors

От
Albe Laurenz
Дата:
Jirí Pavlovský wrote:
>>> I'm getting these errors on tables as
>>> well. Actually when I copy and paste the offending queries from log into pgAdmin it runs without an
>>> error.

>> So the queries work from pgadmin; what application/environment are they NOT working in?
>> Something is obviously different. You say it's a Win32 application, what database libraries and
>> programming languages are involved?

> I'm using  plain c and libpq from 9.2.2. And gtk as a GUI. Compiler is mingw (gcc for windows).

>> Does the application perhaps send trailing garbage after the query or something similar?
>> Something like that might happen if there's a memory allocation bug in the application.
>> I'm assuming here that, if the query string cannot be converted from utf-8 due to garbage
>> characters, the transcoding error triggers before the query parser notices a syntax error.

> Could be. But when I look at the query string in gdb, before it is send, I don't see there anything
> problematic.
> I guess I'll have to try to wite some test cases to try to locate the problem.

Once you can reproduce the problem, try a network trace on the communication
between cleint and server.  Maybe that helps to solve the problem.

Yours,
Laurenz Albe

Re: utf8 errors

От
Tom Lane
Дата:
Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Once you can reproduce the problem, try a network trace on the communication
> between cleint and server.  Maybe that helps to solve the problem.

Actually, if you can reproduce the problem on demand, try attaching to
the backend process with gdb and setting a breakpoint at errfinish().
The stack trace from there would probably be pretty informative about
where/why the failing conversion is being attempted.

            regards, tom lane


Re: utf8 errors

От
Jiří Pavlovský
Дата:
On 26.6.2013 10:58, Albe Laurenz wrote:
> Jirí Pavlovský wrote:
>> I have a win32 application.
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
>>
>> But the query is clean ascii and it doesn't even contain the mentioned
>> character.
>>
>> My database is in UNICODE, client encoding is utf8.
> Could you run the log message through "od -c" on a UNIX
> machine and post the result?  Maybe there are some weird
> invisible bytes in there.
>
>
Hi,

I've already tried that before posting. See below for results. Is the
message in the log the same as the message that postgres receives?


0000000   I   N   S   E   R   T       I   N   T   O       r   e   c   i
0000020   p   i   e   n   t   s       (   D   e   a   l   I   D   ,
0000040   C   o   n   t   a   c   t   i   d   )
0000060
0000100                                       V   A   L   U   E   S
0000120   (   2   9   0   0   9   ,       9   3   8   7   )  \n
0000136



Re: utf8 errors

От
Alban Hertroys
Дата:
On 26 June 2013 11:03, Jiří Pavlovský <jira33@gmail.com> wrote:
On 26.6.2013 10:58, Albe Laurenz wrote:
> Jirí Pavlovský wrote:
>> I have a win32 application.
>>  LOG:  statement: INSERT INTO recipients (DealID,
>> Contactid)                               VALUES (29009, 9387)
>>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
>>
>>
>> But the query is clean ascii and it doesn't even contain the mentioned
>> character.
>>
>> My database is in UNICODE, client encoding is utf8.
> Could you run the log message through "od -c" on a UNIX
> machine and post the result?  Maybe there are some weird
> invisible bytes in there.
>
>
Hi,

I've already tried that before posting. See below for results. Is the
message in the log the same as the message that postgres receives?


0000000   I   N   S   E   R   T       I   N   T   O       r   e   c   i
0000020   p   i   e   n   t   s       (   D   e   a   l   I   D   ,
0000040   C   o   n   t   a   c   t   i   d   )
0000060
0000100                                       V   A   L   U   E   S


What bytes are in the above between the closing brace and VALUES? Is that really white-space? Did you perhaps intentionally put white-space in between there?

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: utf8 errors

От
Vincent Veyron
Дата:
Hi,

FYI, I had the exact same problem earlier this week, while building a
new Debian Stable (Wheezy) server where postgresql version is 9.1.9-1
for a database containing accented characters.

Steps where :
pg_dump of a database encoded in LATIN9 on the old machine which uses
the fr_FR@euro locale
use iconv to convert the dump file to utf-8 on the new machine where
locale is fr_FR.UTF-8
edit dump file, change :
SET client_encoding = 'LATIN9';
to:
SET client_encoding = 'UTF-8';
recreate db on the new machine with the dump file

The database is used in a mod_perl application accessed via a navigator,
similar to the one in my sig. While accented characters coming from the
perl code were fine, all those out of the database would appear garbled
(like : @Å ) and update queries were impossible, generating the same
error message as the OP (ERROR:  invalid byte sequence for encoding
"UTF8": 0x9c)

When using ssh, I had to manually change my client encoding to UTF-8 (my
workstation uses LATIN9) for the data to appear correctly on the screen.

The machine had to go into production, so I finally gave up on UTF-8 and
used LATIN9 as the locale.

I tried reproducing the problem with 9.1 on a stock Debian Squeeze
machine using backports. On this machine, accented characters would
appear garbled, but update queries were possible.


--
Salutations, Vincent Veyron
http://marica.fr/
Gestion des contrats, des contentieux juridiques et des sinistres
d'assurance



Re: utf8 errors

От
Pavel Stehule
Дата:
Hello

2013/6/28 Vincent Veyron <vv.lists@wanadoo.fr>:
> Hi,
>
> FYI, I had the exact same problem earlier this week, while building a
> new Debian Stable (Wheezy) server where postgresql version is 9.1.9-1
> for a database containing accented characters.
>
> Steps where :
> pg_dump of a database encoded in LATIN9 on the old machine which uses
> the fr_FR@euro locale
> use iconv to convert the dump file to utf-8 on the new machine where
> locale is fr_FR.UTF-8
> edit dump file, change :
> SET client_encoding = 'LATIN9';
> to:
> SET client_encoding = 'UTF-8';
> recreate db on the new machine with the dump file
>
> The database is used in a mod_perl application accessed via a navigator,
> similar to the one in my sig. While accented characters coming from the
> perl code were fine, all those out of the database would appear garbled
> (like : @Å ) and update queries were impossible, generating the same
> error message as the OP (ERROR:  invalid byte sequence for encoding
> "UTF8": 0x9c)
>
> When using ssh, I had to manually change my client encoding to UTF-8 (my
> workstation uses LATIN9) for the data to appear correctly on the screen.
>
> The machine had to go into production, so I finally gave up on UTF-8 and
> used LATIN9 as the locale.

there is a same issues in perl dbi driver with UTF8 strings - it does
some artificial intelligence and try to do some utf transformations.

Pavel

>
> I tried reproducing the problem with 9.1 on a stock Debian Squeeze
> machine using backports. On this machine, accented characters would
> appear garbled, but update queries were possible.
>
>
> --
> Salutations, Vincent Veyron
> http://marica.fr/
> Gestion des contrats, des contentieux juridiques et des sinistres
> d'assurance
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: utf8 errors

От
Vincent Veyron
Дата:
I forgot to mention that the machines use an amd64 processor.






Re: utf8 errors

От
Alban Hertroys
Дата:
On Jun 28, 2013, at 8:10, Vincent Veyron <vv.lists@wanadoo.fr> wrote:

> Hi,
>
> FYI, I had the exact same problem earlier this week, while building a
> new Debian Stable (Wheezy) server where postgresql version is 9.1.9-1
> for a database containing accented characters.


You probably had a rather different problem, as you are actually dealing with accented characters in your data.

The OP was dealing with integers, which tend to not have accented characters in them.

I suggest that you create a separate thread for your issue, as they're probably not related.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: utf8 errors

От
Alban Hertroys
Дата:
On Jun 26, 2013, at 16:58, Alban Hertroys <haramrae@gmail.com> wrote:

> On 26 June 2013 11:03, Jiří Pavlovský <jira33@gmail.com> wrote:
> On 26.6.2013 10:58, Albe Laurenz wrote:
> > Jirí Pavlovský wrote:
> >> I have a win32 application.
> >>  LOG:  statement: INSERT INTO recipients (DealID,
> >> Contactid)                               VALUES (29009, 9387)
> >>  ERROR:  invalid byte sequence for encoding "UTF8": 0x9c
> >>
> >>
> >> But the query is clean ascii and it doesn't even contain the mentioned
> >> character.
> >>
> >> My database is in UNICODE, client encoding is utf8.
> > Could you run the log message through "od -c" on a UNIX
> > machine and post the result?  Maybe there are some weird
> > invisible bytes in there.
> >
> >
> Hi,
>
> I've already tried that before posting. See below for results. Is the
> message in the log the same as the message that postgres receives?
>
>
> 0000000   I   N   S   E   R   T       I   N   T   O       r   e   c   i
> 0000020   p   i   e   n   t   s       (   D   e   a   l   I   D   ,
> 0000040   C   o   n   t   a   c   t   i   d   )
> 0000060
> 0000100                                       V   A   L   U   E   S
>
>
> What bytes are in the above between the closing brace and VALUES? Is that really white-space? Did you perhaps
intentionallyput white-space in between there? 

I just tested my theory that there may be garbage characters in your query string tripping the encoding error before a
parseerror: 

postgres=> \i /usr/bin/at
psql:/usr/bin/at:15: ERROR:  invalid byte sequence for encoding "UTF8": 0x80

(/usr/bin/at is a UNIX command executable, for this case it works as binary data)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: utf8 errors

От
Jiří Pavlovský
Дата:
On 28.6.2013 9:09, Alban Hertroys wrote:
On Jun 26, 2013, at 16:58, Alban Hertroys <haramrae@gmail.com> wrote:

On 26 June 2013 11:03, Jiří Pavlovský <jira33@gmail.com> wrote:
On 26.6.2013 10:58, Albe Laurenz wrote:
Jirí Pavlovský wrote:
I have a win32 application.LOG:  statement: INSERT INTO recipients (DealID,
Contactid)                               VALUES (29009, 9387)ERROR:  invalid byte sequence for encoding "UTF8": 0x9c


But the query is clean ascii and it doesn't even contain the mentioned
character.

My database is in UNICODE, client encoding is utf8.
Could you run the log message through "od -c" on a UNIX
machine and post the result?  Maybe there are some weird
invisible bytes in there.


Hi,

I've already tried that before posting. See below for results. Is the
message in the log the same as the message that postgres receives?


0000000   I   N   S   E   R   T       I   N   T   O       r   e   c   i
0000020   p   i   e   n   t   s       (   D   e   a   l   I   D   ,
0000040   C   o   n   t   a   c   t   i   d   )
0000060
0000100                                       V   A   L   U   E   S


What bytes are in the above between the closing brace and VALUES? Is that really white-space? Did you perhaps intentionally put white-space in between there?
I just tested my theory that there may be garbage characters in your query string tripping the encoding error before a parse error:

postgres=> \i /usr/bin/at
psql:/usr/bin/at:15: ERROR:  invalid byte sequence for encoding "UTF8": 0x80

(/usr/bin/at is a UNIX command executable, for this case it works as binary data)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


Hi, I've already found the problem - as could have been expected it was due to a bug in my code. And the offending query was not the one above. It was the next one, which did not get logged.
So, actually, you are right.

Thanks,
-- 
Jiří Pavlovský

Re: utf8 errors

От
Vincent Veyron
Дата:
Le vendredi 28 juin 2013 à 08:15 +0200, Pavel Stehule a écrit :

> there is a same issues in perl dbi driver with UTF8 strings - it does
> some artificial intelligence and try to do some utf transformations.
>

Hi Pavel,

I glanced over it, but dismissed it as the problem also appeared in my
ssh sessions. I'll look again and open another thread if needed, as
Alban suggested.

Thank you.

--
Salutations, Vincent Veyron
http://marica.fr/
Gestion des contrats, des contentieux juridiques et des sinistres
d'assurance