Обсуждение: websearch_to_tsquery() and apostrophe inside double quotes
Hi all,
I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside double quotes.
Here is an example of searching for a name containing an apostrophe.
The following works as expected:
select to_tsvector('peter o''toole') @@ websearch_to_tsquery('peter o''toole');?column?----------t(1 row)
When the name is in double quotes, the search fails:
select to_tsvector('peter o''toole') @@ websearch_to_tsquery('"peter o''toole"');?column?----------f(1 row)
In the first case, websearch_to_tsquery() returns:
select websearch_to_tsquery('peter o''toole');websearch_to_tsquery------------------------'peter' & 'o' & 'tool'(1 row)
which makes sense to me.
In the second case websearch_to_tsquery() returns something that I can't quite understand:
select websearch_to_tsquery('"peter o''toole"');websearch_to_tsquery------------------------------'peter' <-> ( 'o' & 'tool' )(1 row)
I am not quite sure what text this will actually match?
Best regards,
Alastair
Alastair McKinley <a.mckinley@analyticsengines.com> writes: > I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside doublequotes. > ... > select websearch_to_tsquery('"peter o''toole"'); > websearch_to_tsquery > ------------------------------ > 'peter' <-> ( 'o' & 'tool' ) > (1 row) > I am not quite sure what text this will actually match? I believe it's impossible for that to match anything :-(. It would require 'o' and 'tool' to match the same lexeme (one immediately after a 'peter') which of course is impossible. The underlying tsvector type seems to treat the apostrophe the same as whitespace; it separates 'o' and 'toole' into distinct words: # select to_tsvector('peter o''toole'); to_tsvector -------------------------- 'o':2 'peter':1 'tool':3 (1 row) So it seems to me that this is a bug: websearch_to_tsquery should also treat "'" like whitespace. There's certainly not anything in its documentation that suggests it should treat "'" specially. If it didn't, you'd get # select websearch_to_tsquery('"peter o toole"'); websearch_to_tsquery ---------------------------- 'peter' <-> 'o' <-> 'tool' (1 row) which would match this tsvector. regards, tom lane
Hi Tom,
Thank you for looking at this. You are right I couldn't find anything in the docs that would explain this.
I can't think of any rationale for producing a query like this so it does look like a bug.
Best regards,
Alastair
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: 10 October 2019 14:35
To: Alastair McKinley <a.mckinley@analyticsengines.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; teodor@sigaev.ru <teodor@sigaev.ru>
Subject: Re: websearch_to_tsquery() and apostrophe inside double quotes
Sent: 10 October 2019 14:35
To: Alastair McKinley <a.mckinley@analyticsengines.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; teodor@sigaev.ru <teodor@sigaev.ru>
Subject: Re: websearch_to_tsquery() and apostrophe inside double quotes
Alastair McKinley <a.mckinley@analyticsengines.com> writes:
> I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside double quotes.
> ...
> select websearch_to_tsquery('"peter o''toole"');
> websearch_to_tsquery
> ------------------------------
> 'peter' <-> ( 'o' & 'tool' )
> (1 row)
> I am not quite sure what text this will actually match?
I believe it's impossible for that to match anything :-(.
It would require 'o' and 'tool' to match the same lexeme
(one immediately after a 'peter') which of course is impossible.
The underlying tsvector type seems to treat the apostrophe the
same as whitespace; it separates 'o' and 'toole' into
distinct words:
# select to_tsvector('peter o''toole');
to_tsvector
--------------------------
'o':2 'peter':1 'tool':3
(1 row)
So it seems to me that this is a bug: websearch_to_tsquery
should also treat "'" like whitespace. There's certainly
not anything in its documentation that suggests it should
treat "'" specially. If it didn't, you'd get
# select websearch_to_tsquery('"peter o toole"');
websearch_to_tsquery
----------------------------
'peter' <-> 'o' <-> 'tool'
(1 row)
which would match this tsvector.
regards, tom lane
> I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside double quotes.
> ...
> select websearch_to_tsquery('"peter o''toole"');
> websearch_to_tsquery
> ------------------------------
> 'peter' <-> ( 'o' & 'tool' )
> (1 row)
> I am not quite sure what text this will actually match?
I believe it's impossible for that to match anything :-(.
It would require 'o' and 'tool' to match the same lexeme
(one immediately after a 'peter') which of course is impossible.
The underlying tsvector type seems to treat the apostrophe the
same as whitespace; it separates 'o' and 'toole' into
distinct words:
# select to_tsvector('peter o''toole');
to_tsvector
--------------------------
'o':2 'peter':1 'tool':3
(1 row)
So it seems to me that this is a bug: websearch_to_tsquery
should also treat "'" like whitespace. There's certainly
not anything in its documentation that suggests it should
treat "'" specially. If it didn't, you'd get
# select websearch_to_tsquery('"peter o toole"');
websearch_to_tsquery
----------------------------
'peter' <-> 'o' <-> 'tool'
(1 row)
which would match this tsvector.
regards, tom lane