Обсуждение: Regular Expression For Duplicate Words

Поиск

Список

Период

Сортировка

Regular Expression For Duplicate Words

От

Shaozhong SHI

Дата:

02 февраля 2022 г., 11:00:00

This link is interesting.

regex - Regular Expression For Duplicate Words - Stack Overflow

Is there any example in Postgres?

Regards,

David

Re: Regular Expression For Duplicate Words

От

"David G. Johnston"

Дата:

02 февраля 2022 г., 18:22:46

On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:

This link is interesting.

regex - Regular Expression For Duplicate Words - Stack Overflow

Is there any example in Postgres?

Not that I'm immediately aware of, and I'm not going to search the internet for you.

The regex capabilities in PostgreSQL are pretty full-featured so a solution should be possible. You should try translating the SO post concepts into PostgreSQL yourself and ask specific questions if you get stuck.

David J.

Re: Regular Expression For Duplicate Words

От

Jian He

Дата:

02 февраля 2022 г., 21:21:33

It's an interesting question. But I also don't know how to do it in PostgreSQL.

But I figured out alternative solutions.

GNU Grep: grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'

ripgrep: rg '(hello)[[:blank:]]+\1' --pcre2 <<<'one hello hello world'

On Wed, Feb 2, 2022 at 8:53 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
This link is interesting.

regex - Regular Expression For Duplicate Words - Stack Overflow

Is there any example in Postgres?

Not that I'm immediately aware of, and I'm not going to search the internet for you.

The regex capabilities in PostgreSQL are pretty full-featured so a solution should be possible. You should try translating the SO post concepts into PostgreSQL yourself and ask specific questions if you get stuck.

David J.

Re: Regular Expression For Duplicate Words

От

"Peter J. Holzer"

Дата:

03 февраля 2022 г., 22:48:00

On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:
> regex - Regular Expression For Duplicate Words - Stack Overflow
>
> Is there any example in Postgres?

It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described on
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.

So

[[:<:]]        start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]]        end of word
\W+            one or more non-word characters
[[:<:]]        start of word
\1             the content of the first (and only) capturing group
[[:>:]]        end of word

All together:

select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

signature.asc

Re: Regular Expression For Duplicate Words

От

Shaozhong SHI

Дата:

04 февраля 2022 г., 00:09:03

Hi, Peter, Interesting.

On Thu, 3 Feb 2022 at 19:48, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:
> regex - Regular Expression For Duplicate Words - Stack Overflow
>
> Is there any example in Postgres?

It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described on
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.

So

[[:<:]] start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]] end of word
\W+ one or more non-word characters
[[:<:]] start of word
\1 the content of the first (and only) capturing group
[[:>:]] end of word

All together:

select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';

Give a good example if you can.

Regards,

David

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Regular Expression For Duplicate Words

Regular Expression For Duplicate Words

Re: Regular Expression For Duplicate Words

Re: Regular Expression For Duplicate Words

Re: Regular Expression For Duplicate Words

Вложения

Re: Regular Expression For Duplicate Words