Re: Tsearch2 Dutch snowball stemmer in PG8.1
От | Oleg Bartunov |
---|---|
Тема | Re: Tsearch2 Dutch snowball stemmer in PG8.1 |
Дата | |
Msg-id | Pine.LNX.4.64.0710031630410.3304@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | Tsearch2 Dutch snowball stemmer in PG8.1 (Alban Hertroys <a.hertroys@magproductions.nl>) |
Ответы |
Re: Tsearch2 Dutch snowball stemmer in PG8.1
(Alban Hertroys <a.hertroys@magproductions.nl>)
|
Список | pgsql-general |
Alban, the documentation you're refereed on is for upcoming 8.3 release. For 8.1 and 8.2 you need to do all machinery by hand. It's not difficult, for example: -- sample tsearch2 configuration for search.postgresql.org -- Creates configuration 'pg' - default, should match server's locale !!! -- Change 'ru_RU.UTF-8' begin; -- create special (default) configuration 'pg' update pg_ts_cfg set locale=NULL where locale = 'ru_RU.UTF-8'; insert into pg_ts_cfg values('pg','default','ru_RU.UTF8'); -- register 'pg_dict' dictionary using synonym template -- postgres pg -- pgsql pg -- postgresql pg insert into pg_ts_dict (select 'pg_dict',dict_init, '/usr/local/pgsql-dev/share/contrib/pg_dict.txt', dict_lexize, 'pg-specific dictionary' from pg_ts_dict where dict_name='synonym' ); -- register ispell dictionary, check paths and stop words -- I used iconv for english files, since there are some cyrillic stuff insert into pg_ts_dict (SELECT 'en_ispell', dict_init, 'DictFile="/usr/local/share/dicts/ispell/utf8/english-utf8.dict",' 'AffFile="/usr/local/share/dicts/ispell/utf8/english-utf8.aff",' 'StopFile="/usr/local/share/dicts/ispell/utf8/english-utf8.stop"', dict_lexize FROM pg_ts_dict WHERE dict_name = 'ispell_template' ); -- use the same stop-word list as 'en_ispell' dictionary UPDATE pg_ts_dict set dict_initoption='/usr/local/share/dicts/english.stop' where dict_name='en_stem'; -- default token<->dicts mappings insert into pg_ts_cfgmap select 'pg', tok_alias, dict_name from public.pg_ts_cfgmap where ts_name='default'; -- modify mappings for latin words for configuration 'pg' update pg_ts_cfgmap set dict_name = '{pg_dict,en_ispell,en_stem}' where tok_alias in ( 'lword', 'lhword', 'lpart_hword' ) and ts_name = 'pg'; -- we won't index/search some tokens update pg_ts_cfgmap set dict_name = NULL --where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word') where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float') and ts_name = 'pg'; end; -- testing select * from ts_debug(' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software: PostgreSQL 8.2. '); Oleg On Wed, 3 Oct 2007, Alban Hertroys wrote: > Hello, > > I'm trying to get a Dutch snowball stemmer in Postgres 8.1, but I can't > find how to do that. > > I found CREATE FULLTEXT DICTIONARY commands in the tsearch2 docs on > http://www.sai.msu.su/~megera/postgres/fts/doc/index.html, but these > commands are apparently not available on PG8.1. > > I also found the tables pg_ts_(cfg|cfgmap|dict|parser), but I have no > idea how to add a Dutch stemmer to those. > > I did find some references to stem.[ch] files that were suggested to > compile into the postgres sources, but I cannot believe that's the right > way to do this (besides that I don't have sufficient privileges to > install such a version). > > So... How do I do this? > > The system involved is some version of Debian Linux (2.6 kernel); are > there any packages for a Dutch stemmer maybe? > > I'm in a bit of a hurry too, as we're on a tight deadline :( > > Regards, > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-general по дате отправления:
Предыдущее
От: Alvaro HerreraДата:
Сообщение: Re: pg_cancel_backend() does not work with buzz queries