Обсуждение: text search synonym dictionary anomaly with numbers

Поиск
Список
Период
Сортировка

text search synonym dictionary anomaly with numbers

От
Richard Greenwood
Дата:
I am working with street address data in which 'first st' has been
entered as '1 st' and so on. So I have created a text search
dictionary with entries:
     first  1
     1st  1
And initially it seems to be working properly:

SELECT ts_lexize('rwg_synonym','first');
 ts_lexize
-----------
 {1}


SELECT ts_lexize('rwg_synonym','1st');
 ts_lexize
-----------
 {1}

But my queries on '1st' are not returning the expected results:

 SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
 count
-------
   403  <- this is what I want

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
 count
-------
   403  <- this is also good

 SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
 count
-------
     4  <- this is not good. There are 4 records that do have '1st',
but why am I not getting 403 records?

Thanks for reading,
Rich

--
Richard Greenwood
richard.greenwood@gmail.com
www.greenwoodmap.com

Re: text search synonym dictionary anomaly with numbers

От
Oleg Bartunov
Дата:
Richard,

you should check your mapping - '1st' belongs to 'numword' and may be processed
in a different way than 'first' or '1'.

Oleg
On Sat, 26 Nov 2011, Richard Greenwood wrote:

> I am working with street address data in which 'first st' has been
> entered as '1 st' and so on. So I have created a text search
> dictionary with entries:
>     first  1
>     1st  1
> And initially it seems to be working properly:
>
> SELECT ts_lexize('rwg_synonym','first');
> ts_lexize
> -----------
> {1}
>
>
> SELECT ts_lexize('rwg_synonym','1st');
> ts_lexize
> -----------
> {1}
>
> But my queries on '1st' are not returning the expected results:
>
> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
> count
> -------
>   403  <- this is what I want
>
> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
> count
> -------
>   403  <- this is also good
>
> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
> count
> -------
>     4  <- this is not good. There are 4 records that do have '1st',
> but why am I not getting 403 records?
>
> Thanks for reading,
> Rich
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: text search synonym dictionary anomaly with numbers

От
Richard Greenwood
Дата:
Oleg,

Thank you. I am sure that you have identified my problem.

 \dF+ english (output below) lists my dictionary which is named
'rwg_synonym' before numword so I would have thought that my
dictionary would have normalized '1st' to '1' before the numword
dictionary was reached. Maybe this question belongs in a new thread,
but I do thank you for helping me to look in the correct place.

Best regards,
Rich

fremontwy=# \dF+ english
Text search configuration "pg_catalog.english"
Parser: "pg_catalog.default"
      Token      |       Dictionaries
-----------------+--------------------------
 asciihword      | english_stem
 asciiword       | rwg_synonym,english_stem
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | english_stem
 hword_asciipart | english_stem
 hword_numpart   | simple
 hword_part      | english_stem
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | english_stem



On Sun, Nov 27, 2011 at 7:29 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:
> Richard,
>
> you should check your mapping - '1st' belongs to 'numword' and may be
> processed
> in a different way than 'first' or '1'.
>
> Oleg
> On Sat, 26 Nov 2011, Richard Greenwood wrote:
>
>> I am working with street address data in which 'first st' has been
>> entered as '1 st' and so on. So I have created a text search
>> dictionary with entries:
>>    first  1
>>    1st  1
>> And initially it seems to be working properly:
>>
>> SELECT ts_lexize('rwg_synonym','first');
>> ts_lexize
>> -----------
>> {1}
>>
>>
>> SELECT ts_lexize('rwg_synonym','1st');
>> ts_lexize
>> -----------
>> {1}
>>
>> But my queries on '1st' are not returning the expected results:
>>
>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
>> count
>> -------
>>  403  <- this is what I want
>>
>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
>> count
>> -------
>>  403  <- this is also good
>>
>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
>> count
>> -------
>>    4  <- this is not good. There are 4 records that do have '1st',
>> but why am I not getting 403 records?
>>
>> Thanks for reading,
>> Rich
>>
>>
>
>        Regards,
>                Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>



--
Richard Greenwood
richard.greenwood@gmail.com
www.greenwoodmap.com

Re: text search synonym dictionary anomaly with numbers

От
Richard Greenwood
Дата:
To answer my own question - my synonym dictionary was not be applied
to '1st' because '1st' is a numword, not an asciiword, and my synonym
dictionary was not mapped to numword. To map a dictionary token class:

ALTER TEXT SEARCH CONFIGURATION english
   ALTER MAPPING FOR numword WITH my_synonym_dictionary, simple;

The dictionary must already have been created with CREATE TEXT SEARCH
DICTIONARY

Rich

On Sun, Nov 27, 2011 at 9:57 AM, Richard Greenwood
<richard.greenwood@gmail.com> wrote:
> Oleg,
>
> Thank you. I am sure that you have identified my problem.
>
>  \dF+ english (output below) lists my dictionary which is named
> 'rwg_synonym' before numword so I would have thought that my
> dictionary would have normalized '1st' to '1' before the numword
> dictionary was reached. Maybe this question belongs in a new thread,
> but I do thank you for helping me to look in the correct place.
>
> Best regards,
> Rich
>
> fremontwy=# \dF+ english
> Text search configuration "pg_catalog.english"
> Parser: "pg_catalog.default"
>      Token      |       Dictionaries
> -----------------+--------------------------
>  asciihword      | english_stem
>  asciiword       | rwg_synonym,english_stem
>  email           | simple
>  file            | simple
>  float           | simple
>  host            | simple
>  hword           | english_stem
>  hword_asciipart | english_stem
>  hword_numpart   | simple
>  hword_part      | english_stem
>  int             | simple
>  numhword        | simple
>  numword         | simple
>  sfloat          | simple
>  uint            | simple
>  url             | simple
>  url_path        | simple
>  version         | simple
>  word            | english_stem
>
>
>
> On Sun, Nov 27, 2011 at 7:29 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>> Richard,
>>
>> you should check your mapping - '1st' belongs to 'numword' and may be
>> processed
>> in a different way than 'first' or '1'.
>>
>> Oleg
>> On Sat, 26 Nov 2011, Richard Greenwood wrote:
>>
>>> I am working with street address data in which 'first st' has been
>>> entered as '1 st' and so on. So I have created a text search
>>> dictionary with entries:
>>>    first  1
>>>    1st  1
>>> And initially it seems to be working properly:
>>>
>>> SELECT ts_lexize('rwg_synonym','first');
>>> ts_lexize
>>> -----------
>>> {1}
>>>
>>>
>>> SELECT ts_lexize('rwg_synonym','1st');
>>> ts_lexize
>>> -----------
>>> {1}
>>>
>>> But my queries on '1st' are not returning the expected results:
>>>
>>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
>>> count
>>> -------
>>>  403  <- this is what I want
>>>
>>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
>>> count
>>> -------
>>>  403  <- this is also good
>>>
>>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
>>> count
>>> -------
>>>    4  <- this is not good. There are 4 records that do have '1st',
>>> but why am I not getting 403 records?
>>>
>>> Thanks for reading,
>>> Rich
>>>
>>>
>>
>>        Regards,
>>                Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>
>
>
> --
> Richard Greenwood
> richard.greenwood@gmail.com
> www.greenwoodmap.com
>



--
Richard Greenwood
richard.greenwood@gmail.com
www.greenwoodmap.com