Re: Counting the number of repeated phrases in a column

Поиск
Список
Период
Сортировка
От Ivan E. Panchenko
Тема Re: Counting the number of repeated phrases in a column
Дата
Msg-id 80903e46-1996-89a1-c937-831e44fcfcb7@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Counting the number of repeated phrases in a column  (Shaozhong SHI <shishaozhong@gmail.com>)
Список pgsql-general


On 26.01.2022 11:11, Shaozhong SHI wrote:


On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko <i.panchenko@postgrespro.ru> wrote:

On 26.01.2022 00:21, benj.dev@laposte.net wrote:
> Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
>> There is a short of a function in the standard Postgres to do the
>> following:
>>
>> It is easy to count the number of occurrence of words, but it is
>> rather difficult to count the number of occurrence of phrases.
>>
>> For instance:
>>
>> A cell of value:  'Hello World' means 1 occurrence a phrase.
>>
>> A cell of value: 'Hello World World Hello' means no occurrence of any
>> repeated phrase.
>>
>> But, A cell of value: 'Hello World World Hello Hello World' means 2
>> occurrences of 'Hello World'.
>>
>> 'The City of London, London' also has no occurrences of any repeated
>> phrase.
>>
>> Anyone has got such a function to check out the number of occurrence
>> of any repeated phrases?
>>
>> Regards,
>>
>> David
>
> Don't know if it's exactly what you want, but you can replace all
> occurence of the phrase in the text by empty string and compute the
> diff between the initial and the result and next divide by the length
> of your phrase.
>
> Example :
> WITH x AS (SELECT 'toto like tata and toto like titi and toto like
> tutu' , 'toto like' phrase)
> SELECT (char_length(texte) - char_length(replace(texte, phrase, '')))
> / char_length(phrase) AS nb_occurence
> FROM x
>
This works if the user knows the phrase. As far as I understood, the
phrase is not known, and user wants to count number of repeats of any
phrases.
Of course this can be done with recursive CTE. Split into words,
generate all phrases (AFAIK requires recursion), then group and count.

But probably in PL/Perl this could be done  more effectively.

Is there an example of using recursive CTE to split a text string into words?

Recursion is not needed for splitting into words. This can be done by regexp_split_to_table function.

But generation of all possible phrases from the given list of words probably requires recursion. On the first step the list of words becomes a list of a single-worded phrases. On each iteration then, you add the next word to each existing phrase, if it is possible (i.e. until the last word is reached).


Regards,

David 
Regards,
Ivan

В списке pgsql-general по дате отправления:

Предыдущее
От: benj.dev@laposte.net
Дата:
Сообщение: Re: Counting the number of repeated phrases in a column
Следующее
От: Matthias Apitz
Дата:
Сообщение: SELECT with LIKE clause makes full table scan