Обсуждение: How to display complicated Chinese character: Biang.
Inspired by this thread: https://www.postgresql.org/message-id/011f01d8757e%24f5d69700%24e183c500%24%40ndensan.co.jp
Trying to display some special Chinese characters in Postgresql. For now I am using postgresql 15 beta1. The OS is Ubuntu 20.
localhost:5433 admin@test=# show LC_COLLATE;
+------------+
| lc_collate |
+------------+
| C.UTF-8 |
+------------+
+------------+
| lc_collate |
+------------+
| C.UTF-8 |
+------------+
localhost:5433 admin@test=# select icu_unicode_version();
+---------------------+
| icu_unicode_version |
+---------------------+
| 13.0 |
+---------------------+
icu_unicode_version is the extension function.
Wiki about character Biang: https://en.wikipedia.org/wiki/Biangbiang_noodles
quote:
The character's traditional and simplified forms were added to Unicode version 13.0 in March 2020 in the CJK Unified Ideographs Extension G block of the newly allocated Tertiary Ideographic Plane.[19] The corresponding Unicode characters are:
Unicode character info: https://www.compart.com/en/unicode/U+30EDD
query
with strings(s) as (
values (U&'\+0030EDD')
)
select s,
octet_length(s),
char_length(s),
(select count(*) from icu_character_boundaries(s,'en')) as graphemes from strings;
return
+-----+--------------+-------------+-----------+
| s | octet_length | char_length | graphemes |
+-----+--------------+-------------+-----------+
| ロD | 4 | 2 | 2 |
+-----+--------------+-------------+-----------+
| s | octet_length | char_length | graphemes |
+-----+--------------+-------------+-----------+
| ロD | 4 | 2 | 2 |
+-----+--------------+-------------+-----------+
Seems not right. graphemes should be 1?
And I am not sure values (U&'\+0030EDD') is the same as 𰻝.
--
I recommend David Deutsch's <<The Beginning of Infinity>>
Jian
On Thu, 2022-06-02 at 12:45 +0530, jian he wrote: > Trying to display some special Chinese characters in Postgresql. > > localhost:5433 admin@test=# show LC_COLLATE; > +------------+ > | lc_collate | > +------------+ > | C.UTF-8 | > +------------+ > > > with strings(s) as ( > > values (U&'\+0030EDD') > > ) > > select s, > > octet_length(s), > > char_length(s), > > (select count(*) from icu_character_boundaries(s,'en')) as graphemes from strings; > > > > +-----+--------------+-------------+-----------+ > | s | octet_length | char_length | graphemes | > +-----+--------------+-------------+-----------+ > | ロD | 4 | 2 | 2 | > +-----+--------------+-------------+-----------+ > > Seems not right. graphemes should be 1? You have an extra "0" there; "\+" unicode escapes have exactly 6 digits: WITH strings(s) AS ( VALUES (U&'\+030EDD') ) select s, octet_length(s), char_length(s) from strings; s │ octet_length │ char_length ════╪══════════════╪═════════════ 𰻝 │ 4 │ 1 (1 row) PostgreSQL doesn't have a function "icu_character_boundaries". Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com