So, insert into values(chr(226)||chr(128)||chr(166)) actually got stored in database with LATIN1 with single byte
sequence,but when query select * from testutf8, it got converted to UTF8 three byte sequence first ?
jamet=# select chr(226)||chr(128)||chr(166);
?column?
----------
...
(1 row)
jamet=# select * from testutf8;
test
--------------------------------------------------------------------------------
...
jamet=# select encode(test::bytea,'hex') from testutf8;
encode
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-
e280a6
Thanks,
James
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, August 17, 2023 9:33 AM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: inserts bypass encoding conversion
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> In this case, the real value stored in database is UTF8 byte sequence
> instead of LATIN1 encoding text, right?
Not if you have server_encoding = LATIN1, as you stated earlier.
In that case, the data in the database is in LATIN1, and chr() interprets its argument as a LATIN1 code value --- which
happensto look enough like a Unicode code point to be possibly confusing, until you try to use code points that aren't
withinLATIN1.
regards, tom lane