Обсуждение: How to create dictionaries for tsearch

Поиск
Список
Период
Сортировка

How to create dictionaries for tsearch

От
Paulo Jan
Дата:
Hi all:

    I have read the documentation for the tsearch module, specifically the
part about creating custom dictionaries for different languages using
the "makedict.pl" script. What I don't understand, though, is where do I
get the lists of stopwords and endings for each language. Do I have to
write them myself? Is there some reference website where I can get that
kind of information for a given language?




                        Paulo Jan.
                        DDnet.

Re: How to create dictionaries for tsearch

От
Oleg Bartunov
Дата:
On Thu, 3 Oct 2002, Paulo Jan wrote:

> Hi all:
>
>     I have read the documentation for the tsearch module, specifically the
> part about creating custom dictionaries for different languages using
> the "makedict.pl" script. What I don't understand, though, is where do I
> get the lists of stopwords and endings for each language. Do I have to

which languages ?

> write them myself? Is there some reference website where I can get that
> kind of information for a given language?
>
Google is your friend.

I'd recommend to use OpenFTS (openfts.sourceforge.net) for full text searching
which has support for ispell dictionaries and snowball stemmers,
which have support for spanish.

>
>
>
>                         Paulo Jan.
>                         DDnet.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: How to create dictionaries for tsearch

От
Paulo Jan
Дата:
Oleg Bartunov wrote:
>
> On Thu, 3 Oct 2002, Paulo Jan wrote:
>
> > Hi all:
> >
> >       I have read the documentation for the tsearch module, specifically the
> > part about creating custom dictionaries for different languages using
> > the "makedict.pl" script. What I don't understand, though, is where do I
> > get the lists of stopwords and endings for each language. Do I have to
>
> which languages ?
>


    Spanish.


> > write them myself? Is there some reference website where I can get that
> > kind of information for a given language?
> >
> Google is your friend.
>

    Oh, okay. And not only that, but now that I've paid more attention to
the OpenFTS site, I have seen the link to the snowball stemmers too,
including the spanish one. However...



> I'd recommend to use OpenFTS (openfts.sourceforge.net) for full text searching
> which has support for ispell dictionaries and snowball stemmers,
> which have support for spanish.
>

    Can I use OpenFTS to index and search databases que are not "pure
text", but only have some text fields? From what I see, I have the
impression that OpenFTS is designed to store and search text documents
(newspaper articles, papers, etc.) using a Postgres backend, while in my
case, I'm storing information (photographs and data associated to them)
that has some text fields that need to be indexed and other "normal"
fields (numeric, etc.) that don't need to be, and I need to search by
both of them; in other words, I need to do something like "SELECT * FROM
photos WHERE captionidx @@ 'angelina' AND resolution='high' AND
photodate > '01-01-2002'". Can I use OpenFTS for this kind of mixed
searches? From what I have read, I have the impression that it's a bit
cumbersome to do so.
    Alternatively, can you use the snowball stemmer only with tsearch,
without installing OpenFTS?



                        Paulo Jan.
                        DDnet.

Re: How to create dictionaries for tsearch

От
Oleg Bartunov
Дата:
On Thu, 3 Oct 2002, Paulo Jan wrote:

>
>     Can I use OpenFTS to index and search databases que are not "pure
> text", but only have some text fields? From what I see, I have the
> impression that OpenFTS is designed to store and search text documents
> (newspaper articles, papers, etc.) using a Postgres backend, while in my
> case, I'm storing information (photographs and data associated to them)
> that has some text fields that need to be indexed and other "normal"
> fields (numeric, etc.) that don't need to be, and I need to search by
> both of them; in other words, I need to do something like "SELECT * FROM
> photos WHERE captionidx @@ 'angelina' AND resolution='high' AND
> photodate > '01-01-2002'". Can I use OpenFTS for this kind of mixed
> searches? From what I have read, I have the impression that it's a bit
> cumbersome to do so.

OpenFTS is an *engine* and was specially designed to be embedded
into application. It has several methods which could be used to
construct queries like you need ! For example, get_sql
from perldoc Search::OpenFTS
       get_sql( \@ARRAY_WORD );
       get_sql( $STRING );
       get_sql( \$STRING );
       get_sql( *, %opt );
           %opt - as in the constructor (see above), plus a key
           dict_opt = > {}, transmitted to dictionaries

           Returns parts of SQL:

           ($out, $condition, $order)

           Here is how they can be combined in an SQL statement:

            SELECT
               $opt{txttid}$out
            FROM
               table
            WHERE
               $condition
            $order;

As a bonus you'll get relevance ranking, dictionaries support and
more control.


>     Alternatively, can you use the snowball stemmer only with tsearch,
> without installing OpenFTS?
>

Not for the moment. It's easy to implement but we're very busy.



>
>
>                         Paulo Jan.
>                         DDnet.
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83