Обсуждение: Naming of the prefab snowball stemmer dictionaries
I notice that the existing tsearch documentation that we've imported fairly consistently refers to Snowball dictionaries with names like "en_stem", "ru_stem", etc. However, CVS HEAD is set up to create them with names "english", "russian", etc. As I've been absorbing more of the docs I'm starting to wonder whether this is a good idea. ISTM that these names encourage a novice to think that the one dictionary is all you could need for a given language; and there are enough examples of more-complex setups in the docs to make it clear that in fact Snowball is not the be-all and end-all of dictionaries. I'm thinking that going back to the old naming convention (or something like it --- maybe "english_stem", "russian_stem", etc) would be better. It'd help to give the right impression, namely that these dictionaries are a component of a solution but not necessarily all you need. Thoughts? regards, tom lane
On Aug 22, 2007, at 11:10 , Tom Lane wrote: > I notice that the existing tsearch documentation that we've imported > fairly consistently refers to Snowball dictionaries with names like > "en_stem", "ru_stem", etc. However, CVS HEAD is set up to create them > with names "english", "russian", etc. As I've been absorbing more of > the docs I'm starting to wonder whether this is a good idea. ISTM > that these names encourage a novice to think that the one dictionary > is all you could need for a given language; and there are enough > examples of more-complex setups in the docs to make it clear that > in fact Snowball is not the be-all and end-all of dictionaries. > > I'm thinking that going back to the old naming convention (or > something > like it --- maybe "english_stem", "russian_stem", etc) would be > better. > It'd help to give the right impression, namely that these dictionaries > are a component of a solution but not necessarily all you need. Please use ISO 639 codes plus any qualifiers to reduce confusion. http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes -M
Sounds reasonable, but why exactly did we spell out "english" instead of "en" ? Seems the abbrev is much easier to extract from LANG or browser prefs ... Andreas -----Ursprüngliche Nachricht----- Von: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] Im Auftrag von Tom Lane Gesendet: Mittwoch, 22. August 2007 17:11 An: Oleg Bartunov; Teodor Sigaev Cc: pgsql-hackers@postgreSQL.org Betreff: [HACKERS] Naming of the prefab snowball stemmer dictionaries [bayes][heur] Wichtigkeit: Niedrig I notice that the existing tsearch documentation that we've imported fairly consistently refers to Snowball dictionarieswith names like "en_stem", "ru_stem", etc. However, CVS HEAD is set up to create them with names "english","russian", etc. As I've been absorbing more of the docs I'm starting to wonder whether this is a good idea. ISTMthat these names encourage a novice to think that the one dictionary is all you could need for a given language; andthere are enough examples of more-complex setups in the docs to make it clear that in fact Snowball is not the be-alland end-all of dictionaries. I'm thinking that going back to the old naming convention (or something like it --- maybe "english_stem", "russian_stem",etc) would be better. It'd help to give the right impression, namely that these dictionaries are a component of a solution but not necessarilyall you need. Thoughts? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes: > Sounds reasonable, but why exactly did we spell out "english" instead of "en" ? > Seems the abbrev is much easier to extract from LANG or browser prefs ... Mainly because we're following the upstream snowball project on the naming. I don't think that LANG is relevant to this. If you had an application that wanted to make a selection based on that, what it'd be trying to set is a configuration name, not a dictionary name. regards, tom lane
On Wed, 22 Aug 2007, Tom Lane wrote: > I notice that the existing tsearch documentation that we've imported > fairly consistently refers to Snowball dictionaries with names like > "en_stem", "ru_stem", etc. However, CVS HEAD is set up to create them > with names "english", "russian", etc. As I've been absorbing more of > the docs I'm starting to wonder whether this is a good idea. ISTM > that these names encourage a novice to think that the one dictionary > is all you could need for a given language; and there are enough > examples of more-complex setups in the docs to make it clear that > in fact Snowball is not the be-all and end-all of dictionaries. > > I'm thinking that going back to the old naming convention (or something > like it --- maybe "english_stem", "russian_stem", etc) would be better. > It'd help to give the right impression, namely that these dictionaries > are a component of a solution but not necessarily all you need. > > Thoughts? I agree with you, old naming was more informative. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83