Re: BUG #13440: unaccent does not remove all diacritics

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: BUG #13440: unaccent does not remove all diacritics
Дата	15 июня 2015 г. 23:07:31
Msg-id	CAEepm=0YVseDdN3Odjg2AZ2QvEPshqwJf=4zZbea5cwMQEP1Bw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #13440: unaccent does not remove all diacritics (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: BUG #13440: unaccent does not remove all diacritics (Thomas Munro <thomas.munro@enterprisedb.com>) Re: BUG #13440: unaccent does not remove all diacritics (Michael Gradek <mike@busbud.com>)
Список	pgsql-bugs

Дерево обсуждения

On Tue, Jun 16, 2015 at 12:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> My terminal shows these characters to be different.  One is
>> http://graphemica.com/%C8%9B
>>       latin small letter t with comma below (U+021B)
>
>> The other is
>> http://graphemica.com/%C5%A3
>>       latin small letter t with cedilla (U+0163)
>
> Ah-hah -- I did not look closely enough.  So the immediate answer for
> Michael is to add another entry to his unaccent.rules file.
>
> Should we add the missing character to the standard unaccent.rules file?

It looks like Romanian also has s with comma.  Perhaps we should have
all these characters:

$ curl -s http://unicode.org/Public/7.0.0/ucd/UnicodeData.txt | egrep
';LATIN (SMALL|CAPITAL) LETTER [A-Z] WITH ' | wc -l
     702

That's quite a lot more than the 187 we currently have.  Of those, I
think only the following ligature characters don't fit the above
pattern: =C3=86, =C3=A6, =C4=B2, =C4=B3, =C5=92, =C5=93, =C3=9F.  Incidenta=
lly, I don't believe that the
way we "unaccent" ligatures is correct anyway.  Maybe they should be
expanded to AE, ae, IJ, ij, OE, oe, ss, respectively, not A, a, I, i,
O, o, S as we have it, but I guess it depends what the purpose of
unaccent is...

--=20
Thomas Munro
http://www.enterprisedb.com

В списке pgsql-bugs по дате отправления:

Предыдущее

От: Fabien COELHO
Дата: 15 июня 2015 г., 23:03:02
Сообщение: Re: BUG #13442: ISBN doesn't always roundtrip with text

Следующее

От: "B.Z"
Дата: 16 июня 2015 г., 01:09:44
Сообщение: Re: BUG #13442: ISBN doesn't always roundtrip with text

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #13440: unaccent does not remove all diacritics

Предыдущее

Следующее