Обсуждение: Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

Поиск
Список
Период
Сортировка

Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Alexander Korotkov
Дата:
Hi!

On Mon, Apr 19, 2021 at 9:57 AM Valentin Gatien-Baron
<valentin.gatienbaron@gmail.com> wrote:
> Looking at the tsvector and tsquery, we can see that the problem is
> that the ":" counts as one position for the ts_query but not the
> ts_vector:
>
> select to_tsvector('english', 'aaa: bbb'), websearch_to_tsquery('english', '"aaa: bbb"');
>    to_tsvector   | websearch_to_tsquery
> -----------------+----------------------
>  'aaa':1 'bbb':2 | 'aaa' <2> 'bbb'
> (1 row)

It seems there is another bug with phrase search and query parsing.
It seems to me that since 0c4f355c6a websearch_to_tsquery() should
just parse text in quotes as a single token.  Besides fixing this bug,
it simplifies the code.

Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.

I propose to push the attached patch to v14.  Objections?

------
Regards,
Alexander Korotkov

Вложения

Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Tom Lane
Дата:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> It seems there is another bug with phrase search and query parsing.
> It seems to me that since 0c4f355c6a websearch_to_tsquery() should
> just parse text in quotes as a single token.  Besides fixing this bug,
> it simplifies the code.

OK ...

> Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.

Agreed, plus it doesn't sound like the sort of behavior change that
we want to push out in minor releases.

> I propose to push the attached patch to v14.  Objections?

This patch seems to include some unrelated fooling around in GiST?

            regards, tom lane



Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Alexander Korotkov
Дата:
On Sun, May 2, 2021 at 8:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > It seems there is another bug with phrase search and query parsing.
> > It seems to me that since 0c4f355c6a websearch_to_tsquery() should
> > just parse text in quotes as a single token.  Besides fixing this bug,
> > it simplifies the code.
>
> OK ...
>
> > Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.
>
> Agreed, plus it doesn't sound like the sort of behavior change that
> we want to push out in minor releases.

+1

> > I propose to push the attached patch to v14.  Objections?
>
> This patch seems to include some unrelated fooling around in GiST?

Ooops, I've included this by oversight.  The next revision is attached.

Anything besides that?

------
Regards,
Alexander Korotkov

Вложения

Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Tom Lane
Дата:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> Ooops, I've included this by oversight.  The next revision is attached.
> Anything besides that?

Some quick eyeball review:

+                    /* Everything is quotes is processed as a single token */

Should read "Everything in quotes ..."

-                    /* or else gettoken_tsvector() will raise an error */
+                    /* or else ƒtsvector() will raise an error */

Looks like an unintentional change?

@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
     state.buffer = buf;
     state.buf = buf;
     state.count = 0;
-    state.in_quotes = false;
     state.state = WAITFIRSTOPERAND;
     state.polstr = NIL;

This change seems wrong/unsafe too.

            regards, tom lane



Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Zhihong Yu
Дата:


On Sun, May 2, 2021 at 10:57 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, May 2, 2021 at 8:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > It seems there is another bug with phrase search and query parsing.
> > It seems to me that since 0c4f355c6a websearch_to_tsquery() should
> > just parse text in quotes as a single token.  Besides fixing this bug,
> > it simplifies the code.
>
> OK ...
>
> > Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.
>
> Agreed, plus it doesn't sound like the sort of behavior change that
> we want to push out in minor releases.

+1

> > I propose to push the attached patch to v14.  Objections?
>
> This patch seems to include some unrelated fooling around in GiST?

Ooops, I've included this by oversight.  The next revision is attached.

Anything besides that?

------
Regards,
Alexander Korotkov

Hi,
+                   /* Everything is quotes is processed as a single token */

is quotes -> in quotes 

+                   /* iterate to the closing quotes or end of the string*/

closing quotes -> closing quote

+                   /* or else ƒtsvector() will raise an error */

The character before tsvector() seems to be special.

Cheers

Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Alexander Korotkov
Дата:
On Sun, May 2, 2021 at 9:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > Ooops, I've included this by oversight.  The next revision is attached.
> > Anything besides that?
>
> Some quick eyeball review:
>
> +                    /* Everything is quotes is processed as a single token */
>
> Should read "Everything in quotes ..."
>
> -                    /* or else gettoken_tsvector() will raise an error */
> +                    /* or else ƒtsvector() will raise an error */
>
> Looks like an unintentional change?

Thank you for catching this!

> @@ -846,7 +812,6 @@ parse_tsquery(char *buf,
>         state.buffer = buf;
>         state.buf = buf;
>         state.count = 0;
> -       state.in_quotes = false;
>         state.state = WAITFIRSTOPERAND;
>         state.polstr = NIL;
>
> This change seems wrong/unsafe too.

It seems OK, because this patch removes in_quotes field altogether.
We don't have to know whether we in quotes in the state, since we
process everything in quotes as a single token.

------
Regards,
Alexander Korotkov

Вложения

Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Alexander Korotkov
Дата:
On Sun, May 2, 2021 at 9:06 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> +                   /* Everything is quotes is processed as a single token */
>
> is quotes -> in quotes
>
> +                   /* iterate to the closing quotes or end of the string*/
>
> closing quotes -> closing quote
>
> +                   /* or else ƒtsvector() will raise an error */
>
> The character before tsvector() seems to be special.

Thank you for catching.  Fixed in v3.

------
Regards,
Alexander Korotkov



Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Alexander Korotkov
Дата:
On Sun, May 2, 2021 at 9:17 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> One minor comment:
> +                   /* iterate to the closing quotes or end of the string*/
>
> closing quotes -> closing quote

Yep, I've missed the third place to change from plural to single form :)

------
Regards,
Alexander Korotkov

Вложения

Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Zhihong Yu
Дата:


On Sun, May 2, 2021 at 11:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, May 2, 2021 at 9:06 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> +                   /* Everything is quotes is processed as a single token */
>
> is quotes -> in quotes
>
> +                   /* iterate to the closing quotes or end of the string*/
>
> closing quotes -> closing quote
>
> +                   /* or else ƒtsvector() will raise an error */
>
> The character before tsvector() seems to be special.

Thank you for catching.  Fixed in v3.

------
Regards,
Alexander Korotkov

Hi,
One minor comment:
+                   /* iterate to the closing quotes or end of the string*/

closing quotes -> closing quote

Cheers 

Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Tom Lane
Дата:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Sun, May 2, 2021 at 9:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> -       state.in_quotes = false;
>> 
>> This change seems wrong/unsafe too.

> It seems OK, because this patch removes in_quotes field altogether.

Oh, sorry, I misread the patch --- I thought that earlier hunk
was removing a local variable.  Agreed, if you can do without this
state field altogether, that's fine.

            regards, tom lane



Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

От
Alexander Korotkov
Дата:
On Sun, May 2, 2021 at 9:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > On Sun, May 2, 2021 at 9:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> -       state.in_quotes = false;
> >>
> >> This change seems wrong/unsafe too.
>
> > It seems OK, because this patch removes in_quotes field altogether.
>
> Oh, sorry, I misread the patch --- I thought that earlier hunk
> was removing a local variable.  Agreed, if you can do without this
> state field altogether, that's fine.

OK, thank you for review!

------
Regards,
Alexander Korotkov