Обсуждение: Need magic that checks if a collumn contains values that CAN NOT be converted to Latin1
Hi, I've got an 8.4.3 Unicode DB that accidentally holds a few records with characters that can't be converted to Latin1 or 9 for output to CSV. I'd just need a way to check if a collumn contains values that CAN NOT be converted from Utf8 to Latin1 to select all those affected records. I tried: Select convert_to (my_column::text, 'LATIN1') from my_table; It raises an error that says translated: ERROR: character 0xe28093 in encoding »UTF8« has no equivalent in »LATIN1« Regrettably it doesn't explain where it found this sign. Select '\xe28093' complains that this weren't a valid UTF8 code at all. So how was it accepted and stored in the first place? When I know which record has faulty content I can correct it. regards Andreas :)
On 29/04/10 11:02, Andreas wrote: > Hi, > > I've got an 8.4.3 Unicode DB that accidentally holds a few records with > characters that can't be converted to Latin1 or 9 for output to CSV. > > I'd just need a way to check if a collumn contains values that CAN NOT > be converted from Utf8 to Latin1 to select all those affected records. > > I tried: > Select convert_to (my_column::text, 'LATIN1') from my_table; > > It raises an error that says translated: > ERROR: character 0xe28093 in encoding »UTF8« has no equivalent in »LATIN1« > Regrettably it doesn't explain where it found this sign. I'd use a PL/PgSQL procedure to step through the result of a FOR IN SELECT, running each test in an exception handling block. > Select '\xe28093' > complains that this weren't a valid UTF8 code at all. > So how was it accepted and stored in the first place? Because that's the escape E'\xe2" followed by the literal characters "8093". "\xe2" followed by those characters isn't legal utf-8. I'm pretty sure the error actually refers to: select E'\xe2\x80\x93'; which is the character "–" (U+2013 EN DASH). Yes, it'd be nice if PostgreSQL's error message was in syntax that PostgreSQL understood, not pseudo-C-style hex literal form. -- Craig Ringer
Re: Need magic that checks if a collumn contains values that CAN NOT be converted to Latin1
От
Magnus Hagander
Дата:
On Thu, Apr 29, 2010 at 05:02, Andreas <maps.on@gmx.net> wrote: > Hi, > > I've got an 8.4.3 Unicode DB that accidentally holds a few records with > characters that can't be converted to Latin1 or 9 for output to CSV. > > I'd just need a way to check if a collumn contains values that CAN NOT be > converted from Utf8 to Latin1 to select all those affected records. > > I tried: > Select convert_to (my_column::text, 'LATIN1') from my_table; > > It raises an error that says translated: > ERROR: character 0xe28093 in encoding »UTF8« has no equivalent in »LATIN1« > Regrettably it doesn't explain where it found this sign. > > Select '\xe28093' > complains that this weren't a valid UTF8 code at all. > So how was it accepted and stored in the first place? > > When I know which record has faulty content I can correct it. Wrap your check in a simple function: CREATE OR REPLACE FUNCTION public.is_valid_encoding(vtext text, encname text) RETURNS boolean LANGUAGE plpgsql AS $$ BEGIN BEGIN PERFORM convert_to(vtext, encname); EXCEPTION WHEN untranslatable_character THEN RETURN 'f'; END; RETURN 't'; END; $$ And execute SELECT * FROM my_table WHERE NOT is_valid_encoding(my_column, 'LATIN1') -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/