Обсуждение: Lexicographic index ?
Hello, I've been searching since last week how I could implement what looks basically like a dictionnary with postgres : I have a big table (million of rows) which contains a varchar column with words like this for example : Table Twords: words ----------------- guitar saxophon flute And what I want to built a query that tells me if a word given in argument can be found at least partially in the database. Something like select * from twords where words||'%' like 'saxophones'; works but uses a sequential scan on the table... Is it possible to do what I want with postgres or not ? How ? :-) Thanks for your help !
----- Original Message ----- From: <arnaud.mlist1@free.fr> > > And what I want to built a query that tells me if a word given in argument can > be found at least partially in the database. Something like > > select * from twords where words||'%' like 'saxophones'; > > works but uses a sequential scan on the table... > try http://techdocs.postgresql.org/techdocs/fulltextindexing.php hth, Marin ---- "...what you brought from your past, is of no use in your present. When you must choose a new path, do not bring old experiences with you. Those who strike out afresh, but who attempt to retain a little of the old life, end up torn apart by their own memories. "
> > select * from twords where words||'%' like 'saxophones'; > > > > works but uses a sequential scan on the table... > > > try http://techdocs.postgresql.org/techdocs/fulltextindexing.php Hmmm... I don't think this has any chance to work : what I need is that a part of the work I search (beginning from its start) can be found in the table (such as 'saxophones' would match 'saxophone' or 'sax' in the table, but not 'saxophonesandsomethingbehind', and not that the word can be found in a bigger word in the table (it's already easy to find 'saxophones' from 'saxophone' using like and 'saxophone%' : it uses my index correctly. Or didn't I understand something ? Arnaud
An alternative, which may work well for you: create an index on a substring (say, the first five or six characters) of each word, and match against that. To index on a substring, you will need to create a function, something like: CREATE FUNCTION word_substring(text) RETURNS text AS' SELECT substr($1, 1, 5); 'LANGUAGE SQL WITH (iscachable); CREATE INDEX word_substring_index ON word_table (word_substring(word)); --- Marin Dimitrov <marin.dimitrov@sirma.bg> wrote: > > ----- Original Message ----- > From: <arnaud.mlist1@free.fr> > > > > > > And what I want to built a query that tells me if > a word given in argument > can > > be found at least partially in the database. > Something like > > > > select * from twords where words||'%' like > 'saxophones'; > > > > works but uses a sequential scan on the table... > > > > > try > http://techdocs.postgresql.org/techdocs/fulltextindexing.php > > hth, > > Marin > > ---- > "...what you brought from your past, is of no use in > your present. When > you must choose a new path, do not bring old > experiences with you. > Those who strike out afresh, but who attempt to > retain a little of the > old life, end up torn apart by their own memories. " > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the > unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) __________________________________________________ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com
----- Original Message ----- From: <arnaud.mlist1@free.fr> To: <pgsql-general@postgresql.org> > select * from twords where words||'%' like 'saxophones'; > > works but uses a sequential scan on the table... The only method I have been able to find that will use the index is to provide both upper and lower limits on the key. For example: select * from twords where words <= 'saxophones' and words >= 's' and position(words in 'saxophones') = 1; This uses the index in my test, whereas it doesn't if you leave out the second condition, even if you add an 'order by' clause. Using position is slightly faster on my system than using likes (but I am only using the standard /usr/dict/words for testing, so I only have 45402 rows. -- Peter Gibbs EmKel Systems
> The only method I have been able to find that will use the index is to > provide both upper and lower limits on the key. > > select * from twords > where words <= 'saxophones' > and words >= 's' > and position(words in 'saxophones') = 1; That's an interesting progress, but it still doesn't satisfy me (still too slow in most cases). I've been looking to R-TREEs and GiST indexes but I can't figure how I could use them yet... In fact, thet kind of index I would need is exactly what a GiST provides since it references things between two limits. This allows to find fastly if boxes overlap and things like this. In my case, I would have to define an operator that would return true if left operand is exactly the beginning of the right one and false otherwise, and then find a way to use a GiST or R-TREE with that relation... Maybe I'm far away from the solution but I don't understand yet the implementations of GiSTs and R-TREE and how to use them for my special problem. Do I have any chance to solve my problem using that kind of indexes ? Thanks for your replies though ! Arnaud