Обсуждение: Regular Expression For Duplicate Words
This link is interesting.
Is there any example in Postgres?
Regards,
David
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
This link is interesting.Is there any example in Postgres?
Not that I'm immediately aware of, and I'm not going to search the internet for you.
The regex capabilities in PostgreSQL are pretty full-featured so a solution should be possible. You should try translating the SO post concepts into PostgreSQL yourself and ask specific questions if you get stuck.
David J.
It's an interesting question. But I also don't know how to do it in PostgreSQL.
But I figured out alternative solutions.
GNU Grep: grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'
ripgrep: rg '(hello)[[:blank:]]+\1' --pcre2 <<<'one hello hello world'
On Wed, Feb 2, 2022 at 8:53 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:This link is interesting.Is there any example in Postgres?Not that I'm immediately aware of, and I'm not going to search the internet for you.The regex capabilities in PostgreSQL are pretty full-featured so a solution should be possible. You should try translating the SO post concepts into PostgreSQL yourself and ask specific questions if you get stuck.David J.
On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote: > regex - Regular Expression For Duplicate Words - Stack Overflow > > Is there any example in Postgres? It's pretty much the same as with other regexp dialects: User word boundaries and a word character class to match any word and then use a backreference to match a duplicate word. All the building blocks are described on https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP and except for [[:<:]] and [[:>:]] for the word boundaries, they are also pretty standard. So [[:<:]] start of word ([[:alpha:]]+) one or more alphabetic characters in a capturing group [[:>:]] end of word \W+ one or more non-word characters [[:<:]] start of word \1 the content of the first (and only) capturing group [[:>:]] end of word All together: select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]'; hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Вложения
Hi, Peter, Interesting.
On Thu, 3 Feb 2022 at 19:48, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:
> regex - Regular Expression For Duplicate Words - Stack Overflow
>
> Is there any example in Postgres?
It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described on
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.
So
[[:<:]] start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]] end of word
\W+ one or more non-word characters
[[:<:]] start of word
\1 the content of the first (and only) capturing group
[[:>:]] end of word
All together:
select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';
Give a good example if you can.
Regards,
David