Re: Overcoming Initcap Function limitations?

Поиск
Список
Период
Сортировка
От Steve Midgley
Тема Re: Overcoming Initcap Function limitations?
Дата
Msg-id CAJexoSLZKvxVBoueN0_YjTkSu4-G1A1wFbPCqw_9S4b+cs9NXA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Overcoming Initcap Function limitations?  (Bo Guo <bo.guo@gisticinc.com>)
Ответы Re: Overcoming Initcap Function limitations?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql


On Mon, Dec 4, 2023, 5:39 PM Bo Guo <bo.guo@gisticinc.com> wrote:
Thank you, Steve and Greg!  

Your suggestions open up new potentials for me to explore.  At this moment, I lean towards normalizing the database column values in upper case, thereby out-sourcing the case-changing responsibility to the front end.  I would love to hear from your thoughts on this pattern.

Cheers,

Bo 

On Mon, Dec 4, 2023 at 11:39 AM Steve Midgley <science@misuse.org> wrote:


On Mon, Dec 4, 2023 at 10:09 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
It's not clear exactly what you are trying to achieve, but you can use Postgres' built-in text searching system to exclude stopwords. For example:

CREATE FUNCTION initcap_realword(myword TEXT)
  returns TEXT language SQL AS 
$$
SELECT CASE WHEN length(to_tsvector(myword)) < 1
  THEN myword ELSE initcap(myword) END;
$$;

You could extend that to multi-word strings with a little effort. However, knowing that macdonald should be MacDonald requires a lot more intelligence than is provided by any Postgres built-in system or extension that I know of. What you are looking at is the field of science known as Natural Language Processing, which can get very complex very quickly. But for a Postgres answer, you might combine plpython3u with spacy (https://spacy.io/usage/spacy-101).

Cheers,
Greg

I've been having some pretty good experiences with "hard" text transformations such as correct capitalization of names like MacDonald using GPT 3.5 Turbo API which is pretty cheap for the volume of data I've been working with.. Seems like Spacy might do similar things, and if it can be run locally, might be much cheaper than a rental API..

Steve

It really depends on the use case. If your users are happy with all uppercase, that seems like a great solution: fast, cheap, and reliable! 

Note that this list asks posters to add their reply comments at the bottom of each message, rather than the type which is more typical.. Welcome! 

Steve 

В списке pgsql-sql по дате отправления:

Предыдущее
От: Bo Guo
Дата:
Сообщение: Re: Overcoming Initcap Function limitations?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Overcoming Initcap Function limitations?