Обсуждение: Text Search

Поиск
Список
Период
Сортировка

Text Search

От
Greg Elisara
Дата:
Hi folks. I have a product table (music cds) that stores a review in one
of the columns (datatype - text). The copy for the review can hold
anywhere between 100-1500 words. The table holds a little over 200, 000
rows.

I want to set up a way to search through the table using keyword
searches on the reviews. What is the best way to approach this? Any help
would be welcome.

Regards. Greg.


Re: Text Search

От
"D. Duccini"
Дата:
There a several ways to implement this:

1. you can just use the built in regex operator "~*" to hit the table.
this is one way to handle mixed case searches, the other is is use some of
the functions such as "lower()" and substring()

but this is a somewhat nasty approach requiring big iron since you have to
touch every record to find a single match

2. you can do post/processing on your table and build a seperate index of
just the "important" words -- i say "important" because its up to you to
build a "stop-list" of words (usually things like a/and/the/for/with etc)
and make insertions into a two-column table of word and "id" that refers
to the record with the info

this will be faster than approach #1 at the cost of disk space (disposable
these days) -- (the postgresql INDEXES at last look were not handling
HIGH-VOLUME DUPLICATES very elegantly, but thats okay, it will eventually)

3. you can treat your text as a "blob" (binary large object) problem, and
store your text in either seperate files, or seperate file "system"

this allows you the freedom to let the reviews grow as large as you'd like
-- and store multiple formats very easily -- example, maintain a URL to
the text information in your table, and then use some other web searching
"engine" to index the text data -- like HTDIG.  then all you need to do is
bury META data into the pages or self-referential URI that feeds back into
your database .cgi stuff

just some random thoughts to get you thinking :)

-duck


On Sat, 11 Aug 2001, Greg Elisara wrote:

> Hi folks. I have a product table (music cds) that stores a review in one
> of the columns (datatype - text). The copy for the review can hold
> anywhere between 100-1500 words. The table holds a little over 200, 000
> rows.
>
> I want to set up a way to search through the table using keyword
> searches on the reviews. What is the best way to approach this? Any help
> would be welcome.
>
> Regards. Greg.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


-----------------------------------------------------------------------------
david@backpack.com            BackPack Software, Inc.        www.backpack.com
+1 651.645.7550 voice       "Life is an Adventure.
+1 651.645.9798 fax            Don't forget your BackPack!"
-----------------------------------------------------------------------------


Re: Text Search

От
Paul McGarry
Дата:
Hi Greg,

> I want to set up a way to search through the table using keyword
> searches on the reviews. What is the best way to approach this? Any help
> would be welcome.

Take a look at /contrib/fulltextindex/README.fti in the source package.
I use that fulltextindex add on to do something similar to what
you are doing and it works quite nicely.

--
Paul McGarry            mailto:paulm@opentec.com.au
Systems Integrator      http://www.opentec.com.au
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9870 4718
North Ryde NSW 2113     Fax:   (02) 9878 1755
--------------------------------------------------------------------
This document and any attachments are intended solely for
the named addressee(s), are confidential, and may be subject to
legal professional privilege. Please notify us (on +61-2 9878 1744)
as soon as possible if you have received this document in error.
Any confidentiality or privilege is not waived or lost because this
email has been sent to you by mistake. This document and any
attachments are subject to copyright.  No part of them should be
reproduced or distributed by any means whatsoever without the
prior consent of the copyright owner.  Opentec does not warrant
that this email and any attachments are error or virus free.
--------------------------------------------------------------------