Re: invalid UTF-8 byte sequence detected

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: invalid UTF-8 byte sequence detected
Дата
Msg-id 200511151846.jAFIk2F19608@candle.pha.pa.us
обсуждение исходный текст
Ответ на invalid UTF-8 byte sequence detected  ("Markus Wollny" <Markus.Wollny@computec.de>)
Список pgsql-general
I am also confused how invalid UTF8 sequences got into your database.
It shouldn't have been possible.

---------------------------------------------------------------------------

Markus Wollny wrote:
> Hello!
>
> I am currently testdriving migration of our PostgreSQL 8.0 databases to 8.1; in this process I have stumbled a couple
oftimes over certain errors in text-fields that lead to error-messages during import of the dump like these: 
>
> <2005-11-09 14:57:34 CET - 9354: [local]@community_unicode>ERROR:  invalid UTF-8 byte sequence detected near byte
0xb4
> <2005-11-09 14:57:34 CET - 9354: [local]@community_unicode>CONTEXT:  COPY board_message, line 1125662, column text:
"HI
>
>         Besteht ein gewisser Nachteil, wenn ich nur eins von den beiden kaufe, da in beiden Addon?s viel..."
> <2005-11-09 14:57:34 CET - 9354: [local]@community_unicode>STATEMENT:  COPY board_message (board_id, thread_id,
father_id,message_id, user_id, title, signat 
> ure, follow_up, count_reply, last_reply, created, article_id, logged_ip, state_id, user_login, user_status,
user_rank,user_rank_description, user_rank_picture, user_rights, text, deleted_user_id, deleted_date, deleted_login,
user_created,poll_id, idxfti) FROM stdin; 
>
> <2005-11-09 14:57:49 CET - 9354: [local]@community_unicode>ERROR:  invalid UTF-8 byte sequence detected near byte
0x98
> <2005-11-09 14:57:49 CET - 9354: [local]@community_unicode>CONTEXT:  COPY kidszone_tournament2005_user, line 427,
columnphone: "02302?74" 
> <2005-11-09 14:57:49 CET - 9354: [local]@community_unicode>STATEMENT:  COPY kidszone_tournament2005_user (id,
first_name,last_name, adress, birthday, phone, 
>  email, place, permission, ude, ude_box, invited) FROM stdin;
>
> There are not too many occurrences of the same type - five altogether in a 1.8GB compressed dumpfile, but still it
hasme worried and leaves me with some questions: 
>
> 1.) How could I have prevented insertion of these invalid byte-sequences in the first place? We're using UTF-8
encodeddatabases, data is mostly inserted by users via browser applications, our websites are UTF-8 encoded, too, but
stillwe cannot really make 100% sure that all clients behave as expected; on the other hand, it would be extremely
inconvenientif we had to check each and every text input for character set conformance in the application, so is there
away to ascertain "sane" data via some database-setting? pg_restore does throw this error and indeed terminates after
that(I used custom dump format for pg_dump), psql on the other hand just continues with the import (using a
pgdumpall-outputthat generates a standard SQL-script), although it too throws the error. 
>
> 2.) How does this really affect the value of the database-dumps? psql continues with import after the error, but the
tablewhere this error occurred remains empty, as the affected COPY-statement has failed altogether due to this error.
Soa plain no-worries import in my case would present me a result with five tables empty - one of them quite large... Is
theresome kind of magic, maybe involving some perl or whatever, that could help to clean up the dump before the import,
soI can accomplish a full restore? 
>
> Kind regards,
>
>    Markus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

В списке pgsql-general по дате отправления:

Предыдущее
От: Jerry LeVan
Дата:
Сообщение: Cursors or Offset, Limit?
Следующее
От:
Дата:
Сообщение: Customizing the Windows installer