Обсуждение: Out of the box, full text search feature suggestion for postgresql

Поиск
Список
Период
Сортировка

Out of the box, full text search feature suggestion for postgresql

От
aa
Дата:
Hello Postgres Team!

First of all, a big THANK YOU for the great work you folks are doing!

The reason I am writing to you is to suggest a feature in future Postgres versions, a feature that is partially there but is not quite where it should be in my opinion: the full text search functionality. This functionality in my opinion, should be available out of the box, for any possible language available, including east Asia character based languages. You would probably say that this will require a huge amount of work, and I would say, a postgres extension which does exactly this, already exists, and it is called : pgroonga (https://pgroonga.github.io/

  This tool is very good at doing full text search, with minimum effort from the user side (you just have to create some pgroonga indexes and slightly alter your select queries ),   but it is a bit unstable, I believe mainly because it is not fully integrated in postgres code.

The reason I am asking for this functionality to be part of the postgres future release is that currently, to set up postgres to do some decent full text search operations, will require a lot of NL knowledge and DB configurations, which the vast majority of postgres users don't have. So providing something out of the box will be much handier for the majority, while the experts could tweak their DBs using the existing tools.

Thanks, and I hope you will at least discuss this suggestion!
A


Re: Out of the box, full text search feature suggestion for postgresql

От
Bruce Momjian
Дата:
On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:
> Hello Postgres Team!
> 
> First of all, a big THANK YOU for the great work you folks are doing!
> 
> The reason I am writing to you is to suggest a feature in future Postgres
> versions, a feature that is partially there but is not quite where it should be
> in my opinion: the full text search functionality. This functionality in my
> opinion, should be available out of the box, for any possible language
> available, including east Asia character based languages. You would probably
> say that this will require a huge amount of work, and I would say, a postgres
> extension which does exactly this, already exists, and it is called : pgroonga
> (https://pgroonga.github.io/) 

Please explain how this is different from what we already have:

    https://www.postgresql.org/docs/current/textsearch.html

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Out of the box, full text search feature suggestion for postgresql

От
aa
Дата:
Hi Bruce,

As I have mentioned in the initial email, what is currently available in postgresql in regards to full text search, is tailored for NL experts IMO. As an average user you will have to dig a lot to configure your DB and your queries to get some decent speech text results.  And you will have to do these configurations, for every language you will want to support.

I personally spent a couple of weeks trying to set up full text search on a postgres instance, for a couple of different languages, just to realize that is very time consuming and hard to maintain it. So in the end I found proonga, which out of the box, offered me fast full text search support for any language I wanted, including Chinese, Korean and Japanese. I only needed to change my select queries to use their full text search operators  (although pgroonnga support out of the box the use of like and ilike operators)  plus I had to create some proonga indexes on the text fields I wanted to query on. That was it.
They are still missing fuzzy search support and still have to work a bit on stability (especially on huge DBs), but all these are already on their todo list.

So in essence, it took me a couple of weeks of tests and a lot of failures to realize that the postgres existing full text search functionality is too complicated to setup and use, if you want some decent results. With pgroonga I got decent full text search results in a matter of hours, without much changes on my queries.

Thanks,
A

 





On Thu, Dec 28, 2023 at 11:46 AM Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:
> Hello Postgres Team!
>
> First of all, a big THANK YOU for the great work you folks are doing!
>
> The reason I am writing to you is to suggest a feature in future Postgres
> versions, a feature that is partially there but is not quite where it should be
> in my opinion: the full text search functionality. This functionality in my
> opinion, should be available out of the box, for any possible language
> available, including east Asia character based languages. You would probably
> say that this will require a huge amount of work, and I would say, a postgres
> extension which does exactly this, already exists, and it is called : pgroonga
> (https://pgroonga.github.io/

Please explain how this is different from what we already have:

        https://www.postgresql.org/docs/current/textsearch.html

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: Out of the box, full text search feature suggestion for postgresql

От
Artur Zakirov
Дата:
On Thu, 28 Dec 2023 at 17:46, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:
> > Hello Postgres Team!
> >
> > First of all, a big THANK YOU for the great work you folks are doing!
> >
> > The reason I am writing to you is to suggest a feature in future Postgres
> > versions, a feature that is partially there but is not quite where it should be
> > in my opinion: the full text search functionality. This functionality in my
> > opinion, should be available out of the box, for any possible language
> > available, including east Asia character based languages. You would probably
> > say that this will require a huge amount of work, and I would say, a postgres
> > extension which does exactly this, already exists, and it is called : pgroonga
> > (https://pgroonga.github.io/)
>
> Please explain how this is different from what we already have:
>
>         https://www.postgresql.org/docs/current/textsearch.html

I'm not familiar with pgroonga, but the main issue with built-in text
search is that it cannot tokenize asian and many other languages
properly.

Here default parser cannot tokenize Japanese text:

=# select * from ts_parse('default', 'これはペンです');
 tokid |     token
-------+----------------
     2 | これはペンです

Unlike Latin:

=# select * from ts_parse('default', 'this is a pen');
 tokid | token
-------+-------
     1 | this
    12 |
     1 | is
    12 |
     1 | a
    12 |
     1 | pen

To add support for Japanese (and other languages) it is necessary to
write a new parser or fix the existing default parser.

On the other hand pgroonga's source code looks complex, and I doubt
that there are pgsql-hackers who know it and target languages well and
who will be able to port it to Postgres core.

--
Artur



Pgroonga project seems to have solved that problem as it supports any language, out of the box.
As for the "pgsql-hackers" you are looking for, I would say Kou (the main developer in pgroonga) will be the right candidate for that. 

I guess it will be just a matter for you guys to convince him to join forces.

IMO integrating pgroonga logic into postgres will be a huge benefit for the whole postgres community, in regards to full text search functionality.


 
 

On Tue, Jan 2, 2024 at 12:21 PM Artur Zakirov <zaartur@gmail.com> wrote:
On Thu, 28 Dec 2023 at 17:46, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:
> > Hello Postgres Team!
> >
> > First of all, a big THANK YOU for the great work you folks are doing!
> >
> > The reason I am writing to you is to suggest a feature in future Postgres
> > versions, a feature that is partially there but is not quite where it should be
> > in my opinion: the full text search functionality. This functionality in my
> > opinion, should be available out of the box, for any possible language
> > available, including east Asia character based languages. You would probably
> > say that this will require a huge amount of work, and I would say, a postgres
> > extension which does exactly this, already exists, and it is called : pgroonga
> > (https://pgroonga.github.io/)
>
> Please explain how this is different from what we already have:
>
>         https://www.postgresql.org/docs/current/textsearch.html

I'm not familiar with pgroonga, but the main issue with built-in text
search is that it cannot tokenize asian and many other languages
properly.

Here default parser cannot tokenize Japanese text:

=# select * from ts_parse('default', 'これはペンです');
 tokid |     token
-------+----------------
     2 | これはペンです

Unlike Latin:

=# select * from ts_parse('default', 'this is a pen');
 tokid | token
-------+-------
     1 | this
    12 |
     1 | is
    12 |
     1 | a
    12 |
     1 | pen

To add support for Japanese (and other languages) it is necessary to
write a new parser or fix the existing default parser.

On the other hand pgroonga's source code looks complex, and I doubt
that there are pgsql-hackers who know it and target languages well and
who will be able to port it to Postgres core.

--
Artur