Обсуждение: Text Databases

Поиск
Список
Период
Сортировка

Text Databases

От
Nicholas Humfrey
Дата:
Hi,

Here is a very general question:

How good is PostgreSQL at hosting nearly pure text based databases ? Is it
possible to search mutilple cols for the same keyword ? How well would it
handle 40,000+ rows with say 15 columns each with about 20 words in each
cell ?

Nicholas



Re: [GENERAL] Text Databases

От
Dustin Sallings
Дата:
On Wed, 2 Dec 1998, Nicholas Humfrey wrote:

# How good is PostgreSQL at hosting nearly pure text based databases ? Is
# it possible to search mutilple cols for the same keyword ? How well
# would it handle 40,000+ rows with say 15 columns each with about 20
# words in each cell ?

    Get a book on relational databases, it sounds like you're looking
at the wrong tool for a particular job.

--
SA, beyond.com           My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: [GENERAL] Text Databases

От
The Hermit Hacker
Дата:
On Wed, 2 Dec 1998, Nicholas Humfrey wrote:

> Hi,
>
> Here is a very general question:
>
> How good is PostgreSQL at hosting nearly pure text based databases ? Is it
> possible to search mutilple cols for the same keyword ? How well would it
> handle 40,000+ rows with say 15 columns each with about 20 words in each
> cell ?

    See the 'fulltextindex' directory under the contrib directory.
Part of the README follows...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


An attempt at some sort of Full Text Indexing for PostgreSQL.

The included software is an attempt to add some sort of Full Text Indexing
support to PostgreSQL. I mean by this that we can ask questions like:

        Give me all rows that have 'still' and 'nash' in the 'artist'
field.

Ofcourse we can write this as:

        select * from cds where artist ~* 'stills' and artist ~* 'nash';

But this does not use any indices, and therefore, if your database
gets very large, it will not have very high performance (the above query
requires at least one sequential scan, it probably takes 2 due to the
self-join).

The approach used by this add-on is to define a trigger on the table and
column you want to do this queries on. On every insert in the table, it
takes the value in the specified column, breaks the text in this column
up into pieces, and stores all sub-strings into another table, together
with a reference to the row in the original table that contained this
sub-string (it uses the oid of that row).

By now creating an index over the 'fti-table', we can search for
substrings that occur in the original table. By making a join between
the fti-table and the orig-table, we can get the actual rows we want
(this can also be done by using subselects, and maybe there're other
ways too).



Re: [GENERAL] Text Databases

От
Nicholas Humfrey
Дата:
On Wed, 2 Dec 1998, Dustin Sallings wrote:
>it sounds like you're looking at the wrong tool for a particular job.

I am trying to get the best of both worlds. A good relational database
system as well as being able to do text searches in mutiple fields.

On Thu, 3 Dec 1998, Marc G. Fournier wrote:
>See the 'fulltextindex' directory under the contrib directory.

How fast is PostgreSQL at searching for text. Should I, as Dustin Sallings
wrote look for a different relational database system ? I like PostgreSQL's
features especially free source code, so I can run it on different
platforms (Sparc,PPC,i486 etc.) Is there a more suitable free system
available ?

Nicholas



Re: [GENERAL] Text Databases

От
Gregory Maxwell
Дата:
Well, I've found isearch to be very useful for massive fulltext indexes..
Perhaps postgres could borrow it's methods..

On Thu, 3 Dec 1998, Nicholas Humfrey wrote:

>
> On Wed, 2 Dec 1998, Dustin Sallings wrote:
> >it sounds like you're looking at the wrong tool for a particular job.
>
> I am trying to get the best of both worlds. A good relational database
> system as well as being able to do text searches in mutiple fields.
>
> On Thu, 3 Dec 1998, Marc G. Fournier wrote:
> >See the 'fulltextindex' directory under the contrib directory.
>
> How fast is PostgreSQL at searching for text. Should I, as Dustin Sallings
> wrote look for a different relational database system ? I like PostgreSQL's
> features especially free source code, so I can run it on different
> platforms (Sparc,PPC,i486 etc.) Is there a more suitable free system
> available ?
>
> Nicholas
>
>
>


Re: [GENERAL] Text Databases

От
"Gene Selkov Jr."
Дата:
> On Wed, 2 Dec 1998, Dustin Sallings wrote:
> >it sounds like you're looking at the wrong tool for a particular job.
>
> I am trying to get the best of both worlds. A good relational database
> system as well as being able to do text searches in mutiple fields.
>

Why wouldn't you just build a consolidated index from the data in your
multiple fields, then load it into postgres or any other database
engine?

Dustin made a good point: things you are contemplating to do are not
exactly the job of a database backend. Its all yours: the database you
are building is just as good as the index you design and implement in
the form of a relation or a set of relations, which postgres will take
care of.

--Gene


Re: [GENERAL] Text Databases

От
dustin sallings
Дата:
On Thu, 3 Dec 1998, Nicholas Humfrey wrote:

// How fast is PostgreSQL at searching for text. Should I, as Dustin
// Sallings wrote look for a different relational database system ? I
// like PostgreSQL's features especially free source code, so I can run
// it on different platforms (Sparc,PPC,i486 etc.) Is there a more
// suitable free system available ?

    Well, Postgres is just about as good as it gets, but if you're
trying to do arbitrary text searches, grep on text files will always be
faster. The search system mentioned above is good if you're doing exact
matches out of your index, but it still doesn't make sense if you're
trying to match over multiple columns, etc...  Relational databases make
the most sense when you can look up an exact match in a column.

--
SA, beyond.com                            The world is watching America,
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L________________________________________ and America is watching TV. __


Re: [GENERAL] Text Databases

От
The Hermit Hacker
Дата:
On Thu, 3 Dec 1998, Nicholas Humfrey wrote:

>
> On Wed, 2 Dec 1998, Dustin Sallings wrote:
> >it sounds like you're looking at the wrong tool for a particular job.
>
> I am trying to get the best of both worlds. A good relational database
> system as well as being able to do text searches in mutiple fields.
>
> On Thu, 3 Dec 1998, Marc G. Fournier wrote:
> >See the 'fulltextindex' directory under the contrib directory.
>
> How fast is PostgreSQL at searching for text. Should I, as Dustin Sallings
> wrote look for a different relational database system ? I like PostgreSQL's
> features especially free source code, so I can run it on different
> platforms (Sparc,PPC,i486 etc.) Is there a more suitable free system
> available ?

    From reading the README file in the fulltextindex directory, it
looks like the way they implemented it should be as fast as any other of
the indices....check the README file and see what you think...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [GENERAL] Text Databases

От
Bruce Momjian
Дата:
> On Thu, 3 Dec 1998, Nicholas Humfrey wrote:
>
> >
> > On Wed, 2 Dec 1998, Dustin Sallings wrote:
> > >it sounds like you're looking at the wrong tool for a particular job.
> >
> > I am trying to get the best of both worlds. A good relational database
> > system as well as being able to do text searches in mutiple fields.
> >
> > On Thu, 3 Dec 1998, Marc G. Fournier wrote:
> > >See the 'fulltextindex' directory under the contrib directory.
> >
> > How fast is PostgreSQL at searching for text. Should I, as Dustin Sallings
> > wrote look for a different relational database system ? I like PostgreSQL's
> > features especially free source code, so I can run it on different
> > platforms (Sparc,PPC,i486 etc.) Is there a more suitable free system
> > available ?
>
>     From reading the README file in the fulltextindex directory, it
> looks like the way they implemented it should be as fast as any other of
> the indices....check the README file and see what you think...

Yes, it should be.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026