Обсуждение: Full Text Search 101?

Поиск
Список
Период
Сортировка

Full Text Search 101?

От
Jonathan
Дата:
Hi everyone,

I posted to this forum once before and was able to receive help.
Thanks again!

I'm trying to implement full text search capabilities.  Basically, I
have a very simple "data catalog" type of website (http://
gis.drcog.org/datacatalog), where the user can type in a word or words
to search for records that match the criteria.  I have a table with a
few fields, but I want to be able to search/index two fields -- the
"name" of the dataset and the "description."  (or more, if I can index
fields from other tables too, that would be great).  I'd like to be
able to use a full text search to rank the results in terms of
relevance.

What is the best way to handle this?  I've read through the PostgreSQL
documentation and don't quite understand it although I'm trying to
understand and am 'playing around' with this on a development server.
Is the default text search configuration enough? I've tried setting my
own configuration but get errors about not finding .dict dictionary
files?  I have a default install of PostgreSQL 8.4.

Also, I've created a tsvector column and created indexes, but it
didn't seem to delete stop words from the indexes.  Should the stop
words be indexed?

Also, I don't quite understand how to create indexes but rank certain
words as being more important than others, for instance, maybe having
words that come from the "name" column carrying more importance than
words coming from the "description" column.  Finally, I'd like
"watersheds" to come up when someone searches for "water" so I don't
think I have this configured properly, because this record is not
returned.

Is there a good tutorial or maybe documentation that is a bit easier
to understand?  I know my database is far from complicated but I can't
seem to find a configuration that works well.  When I try ranking my
results, most of the results end up with the same rank, so something
must be wrong?

Can someone point me in the right direction?

Thanks for the help.  I appreciate it.

Re: Full Text Search 101?

От
Michael Nacos
Дата:
there's a lot of material on this site: http://www.sai.msu.su/~megera/wiki/tsearch2slides

how do you create your tsvectors? If all your results have the same score (assuming you are using two or more tokens in your tsquery) it might be that your tsvector lacks positional information ('a b'::tsvector vs 'a:1,5 b:3'::tsvector) and/or weights. I have experienced similar behaviour when doing searches against such tsvectors. Using rank_cd instead of rank might help you even then, I think. My understanding is that if you want to use weights in your tsvectors you have to set them yourself (you may concatenate tsvectors from different locations if you want, after using set_weight on some of them).

hope this helps a bit, Michael

2009/11/12 Jonathan <jharahush@gmail.com>
Hi everyone,

I posted to this forum once before and was able to receive help.
Thanks again!

I'm trying to implement full text search capabilities.  Basically, I
have a very simple "data catalog" type of website (http://
gis.drcog.org/datacatalog), where the user can type in a word or words
to search for records that match the criteria.  I have a table with a
few fields, but I want to be able to search/index two fields -- the
"name" of the dataset and the "description."  (or more, if I can index
fields from other tables too, that would be great).  I'd like to be
able to use a full text search to rank the results in terms of
relevance.

What is the best way to handle this?  I've read through the PostgreSQL
documentation and don't quite understand it although I'm trying to
understand and am 'playing around' with this on a development server.
Is the default text search configuration enough? I've tried setting my
own configuration but get errors about not finding .dict dictionary
files?  I have a default install of PostgreSQL 8.4.

Also, I've created a tsvector column and created indexes, but it
didn't seem to delete stop words from the indexes.  Should the stop
words be indexed?

Also, I don't quite understand how to create indexes but rank certain
words as being more important than others, for instance, maybe having
words that come from the "name" column carrying more importance than
words coming from the "description" column.  Finally, I'd like
"watersheds" to come up when someone searches for "water" so I don't
think I have this configured properly, because this record is not
returned.

Is there a good tutorial or maybe documentation that is a bit easier
to understand?  I know my database is far from complicated but I can't
seem to find a configuration that works well.  When I try ranking my
results, most of the results end up with the same rank, so something
must be wrong?

Can someone point me in the right direction?

Thanks for the help.  I appreciate it.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Full Text Search 101?

От
Aurynn Shaw
Дата:
Hi Jonathan;

 >
 > I posted to this forum once before and was able to receive help.
 > Thanks again!
 >
 > I'm trying to implement full text search capabilities.  Basically, I
 > have a very simple "data catalog" type of website (http://
 > gis.drcog.org/datacatalog), where the user can type in a word or words
 > to search for records that match the criteria.  I have a table with a
 > few fields, but I want to be able to search/index two fields -- the
 > "name" of the dataset and the "description."  (or more, if I can index
 > fields from other tables too, that would be great).  I'd like to be
 > able to use a full text search to rank the results in terms of
 > relevance.

You can set up an index per table, but I've found that having a single
vector table with bridge tables to the data I want to search works well.
This would be in the style of:

create table vectors (
  id serial primary key,
  vector tsvector not null
);

create table interesting_data (
  id serial primary key,
  textual text not null
);

CREATE TABLE vector_to_interesting (
  v_id int not null references vectors(id),
  i_id int not null references interesting_data(id)
);

And then

SELECT id.*
   FROM interesting_data itd,
        vectors v,
        vector_to_interesting itv
  WHERE v.id = itv.v_id
    AND itv.i_id = itd.id
    AND v.vector @@ to_tsquery('searchpattern');

Allowing for some procedures around that that return multiple row types,
to the client software.

 >
 > What is the best way to handle this?  I've read through the PostgreSQL
 > documentation and don't quite understand it although I'm trying to
 > understand and am 'playing around' with this on a development server.
 > Is the default text search configuration enough? I've tried setting my
 > own configuration but get errors about not finding .dict dictionary
 > files?  I have a default install of PostgreSQL 8.4.
 >
 > Also, I've created a tsvector column and created indexes, but it
 > didn't seem to delete stop words from the indexes.  Should the stop
 > words be indexed?

The default "english" configuration in PostgreSQL should have done this.
Building your own configuration tends towards being a bit more advanced,
and "english" should suit most needs.

 >
 > Also, I don't quite understand how to create indexes but rank certain
 > words as being more important than others, for instance, maybe having
 > words that come from the "name" column carrying more importance than
 > words coming from the "description" column.

For part of this, there's ts_rank(tsvector, tsquery), as well as the
relative weighting system in the textsearch modules. Unfortunately, I
don't have much experience with the relative weighting, but, ts_rank()
should get you partly there.

I also found an interesting page on relative weighting @
http://www.postgresonline.com/special_feature.php?sf_name=postgresql83tsearch_cheatsheet&outputformat=html

that might be useful for you.

 > Finally, I'd like
 > "watersheds" to come up when someone searches for "water" so I don't
 > think I have this configured properly, because this record is not
 > returned.

Well, there's two ways to achieve this.
The first is to manually the stopwords, which is getting into the custom
configuration and a bit on the more advanced side.
The other is that in PG8.4, you can do

to_tsquery('water:*')

which will tell the search parser to do a partial match, which would
return "watershed", in this instance.
My testing has shown it to be a bit slower (30ms, vs 15ms for a
non-partial search), but not egregiously slow.

 >
 > Is there a good tutorial or maybe documentation that is a bit easier
 > to understand?  I know my database is far from complicated but I can't
 > seem to find a configuration that works well.  When I try ranking my
 > results, most of the results end up with the same rank, so something
 > must be wrong?
 >
 > Can someone point me in the right direction?
 >
 > Thanks for the help.  I appreciate it.
 >

Hope this helps. :)

Regards,
--
Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

ashaw@commandprompt.com

Re: Full Text Search 101?

От
Jonathan
Дата:
Just wanted to thank everyone for the input so far.  I do appreciate
it!
I'm going to read through some of this tonight and see what I can do!

Thanks!

Jonathan

Re: Full Text Search 101?

От
Jonathan
Дата:
Hi!

Thanks again for the help.  Just wanted to let you all know that it
seems like we had an issue with our install of Postgres 8.3 on our
development server.  I installed 8.4 on my own local machine and the
indexing works as it should.

Jonathan