Обсуждение: Regular Expression For Duplicate Words

Поиск
Список
Период
Сортировка

Regular Expression For Duplicate Words

От
Shaozhong SHI
Дата:
This link is interesting.


Is there any example in Postgres?

Regards,

David

Re: Regular Expression For Duplicate Words

От
"David G. Johnston"
Дата:
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
This link is interesting.


Is there any example in Postgres?


Not that I'm immediately aware of, and I'm not going to search the internet for you.

The regex capabilities in PostgreSQL are pretty full-featured so a solution should be possible.  You should try translating the SO post concepts into PostgreSQL yourself and ask specific questions if you get stuck.

David J.

Re: Regular Expression For Duplicate Words

От
Jian He
Дата:

It's an interesting question. But I also don't know how to do it in PostgreSQL.
But I figured out alternative solutions.

GNU Grep:    grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'
ripgrep:     rg  '(hello)[[:blank:]]+\1' --pcre2  <<<'one hello hello world'

On Wed, Feb 2, 2022 at 8:53 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
This link is interesting.


Is there any example in Postgres?


Not that I'm immediately aware of, and I'm not going to search the internet for you.

The regex capabilities in PostgreSQL are pretty full-featured so a solution should be possible.  You should try translating the SO post concepts into PostgreSQL yourself and ask specific questions if you get stuck.

David J.

Re: Regular Expression For Duplicate Words

От
"Peter J. Holzer"
Дата:
On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:
> regex - Regular Expression For Duplicate Words - Stack Overflow
>
> Is there any example in Postgres?

It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described on
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.

So

[[:<:]]        start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]]        end of word
\W+            one or more non-word characters
[[:<:]]        start of word
\1             the content of the first (and only) capturing group
[[:>:]]        end of word

All together:

select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Regular Expression For Duplicate Words

От
Shaozhong SHI
Дата:
Hi, Peter,  Interesting.

On Thu, 3 Feb 2022 at 19:48, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:
> regex - Regular Expression For Duplicate Words - Stack Overflow
>
> Is there any example in Postgres?

It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described on
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.

So

[[:<:]]        start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]]        end of word
\W+            one or more non-word characters
[[:<:]]        start of word
\1             the content of the first (and only) capturing group
[[:>:]]        end of word

All together:

select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';

Give a good example if you can.

Regards,

David