Обсуждение: workaround for expensive KNN?
hello all ... given oleg's posting before i also wanted to fire up some KNN related question. let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. i did some tests: test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- --------------------------------------Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10loops=1) Buffers: shared hit=9 read=5004 -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91rows=3224 width=16) (actual time= 36391.715..45542.573 rows=10 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=9 read=5004Total runtime: 45542.676ms (7 rows) test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- -------------------------------------Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10loops=1) Buffers: shared hit=3 read=2316 -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61rows=7255 width=16) (actual time= 7243.524..10935.217 rows=10 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=3 read=2316Total runtime: 10935.265ms (7 rows) test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- -------------------------------Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) Buffers:shared hit=1 read=1577 -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61rows=7255 width=16) (actual time= 28.525..28.525 rows=1 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=1 read=1577Total runtime: 28.558 ms (7 rows) under any circumstances - there is no way to reduce the number of buffers needed for a query like that. if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. is there any alternative which does not simply die when i try to achieve what i want? the use case is quite simple: all products with a certain word (10 cheapest or so). is there any alternative approach to this? i was putting some hope into KNN but it seems it needs too much random I/O :(. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
Probably, you miss two-columnt index. From my early post: http://www.sai.msu.su/~megera/wiki/knngist =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address)); =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; id | address | dist ---------+-------------------------------------------------------------+--------------------- 366096 | 1st Floor Tour Eiffel| Champs de Mars, Paris 75007, France | 2.32488941293945e-05 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406 5200167 | Champ De Mars 75007 Paris | 0.004535645625872889301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288 2152213 |16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896 1923818 | Champ de Mars Paris, France | 0.00838214733539654 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 7395870 | 39 Rue Champ De Mars Paris, France | 0.008744102345695294358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339 1923742 |12 rue du Champ de Mars Paris, France | 0.00876764731845995 (10 rows) Time: 7.859 ms =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit -> Index Scan using spots_idx on spots Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point)AND (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery)) (3 rows) On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: > hello all ... > > given oleg's posting before i also wanted to fire up some KNN related question. > let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. > i did some tests: > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > -------------------------------------- > Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1) > Buffers: shared hit=9 read=5004 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actualtime= > 36391.715..45542.573 rows=10 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=9 read=5004 > Total runtime: 45542.676 ms > (7 rows) > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > ------------------------------------- > Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1) > Buffers: shared hit=3 read=2316 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= > 7243.524..10935.217 rows=10 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=3 read=2316 > Total runtime: 10935.265 ms > (7 rows) > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > ------------------------------- > Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) > Buffers: shared hit=1 read=1577 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= > 28.525..28.525 rows=1 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=1 read=1577 > Total runtime: 28.558 ms > (7 rows) > > > under any circumstances - there is no way to reduce the number of buffers needed for a query like that. > if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. > is there any alternative which does not simply die when i try to achieve what i want? > > the use case is quite simple: all products with a certain word (10 cheapest or so). > > is there any alternative approach to this? > i was putting some hope into KNN but it seems it needs too much random I/O :(. > > many thanks, > > hans > > -- > Cybertec Sch?nig & Sch?nig GmbH > Gr?hrm?hlgasse 26 > A-2700 Wiener Neustadt, Austria > Web: http://www.postgresql-support.de > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Hans, what if you create index (price,title) ? On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: > hello ... > > i got that one ... > > "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price) > > so, i have a combined index on text + number. > to me the plan seems fine ... it looks like a prober KNN traversal. > the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or soin it (1 mio out of 11 mio or so). you are moving out from one specific place. > > my maths is like that: > 11 mio in total > 1 mio matching "iphone" > cheapest / most expensive 10 out of this mio needed. > > operator classes are all nice and in place: > > SELECT 10 <-> 4 as distance; > distance > ---------- > 6 > (1 row) > > what does "buffers true" in your case say? > > many thanks, > > hans > > > On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote: > >> Probably, you miss two-columnt index. From my early post: >> http://www.sai.msu.su/~megera/wiki/knngist >> >> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address)); >> =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; >> id | address | dist ---------+-------------------------------------------------------------+--------------------- >> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05 >> 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406 >> 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288 >> 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288 >> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896 >> 1923818 | Champ de Mars Paris, France | 0.00838214733539654 >> 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 >> 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 >> 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339 >> 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995 >> (10 rows) >> >> Time: 7.859 ms >> >> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point >> AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; >> >> QUERY PLAN >> -------------------------------------------------------------------------------------------------------------------------------------- >> Limit >> -> Index Scan using spots_idx on spots >> Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery)) >> (3 rows) >> >> >> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: >> >>> hello all ... >>> >>> given oleg's posting before i also wanted to fire up some KNN related question. >>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. >>> i did some tests: >>> >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; >>> QUERY PLAN >>> >>> ----------------------------------------------------------------------------------------------------------------------------- >>> -------------------------------------- >>> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1) >>> Buffers: shared hit=9 read=5004 >>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actualtime= >>> 36391.715..45542.573 rows=10 loops=1) >>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) >>> Order By: (int_price <-> 0::bigint) >>> Buffers: shared hit=9 read=5004 >>> Total runtime: 45542.676 ms >>> (7 rows) >>> >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; >>> QUERY PLAN >>> >>> ----------------------------------------------------------------------------------------------------------------------------- >>> ------------------------------------- >>> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1) >>> Buffers: shared hit=3 read=2316 >>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >>> 7243.524..10935.217 rows=10 loops=1) >>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >>> Order By: (int_price <-> 0::bigint) >>> Buffers: shared hit=3 read=2316 >>> Total runtime: 10935.265 ms >>> (7 rows) >>> >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; >>> QUERY PLAN >>> >>> ----------------------------------------------------------------------------------------------------------------------------- >>> ------------------------------- >>> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) >>> Buffers: shared hit=1 read=1577 >>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >>> 28.525..28.525 rows=1 loops=1) >>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >>> Order By: (int_price <-> 0::bigint) >>> Buffers: shared hit=1 read=1577 >>> Total runtime: 28.558 ms >>> (7 rows) >>> >>> >>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that. >>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. >>> is there any alternative which does not simply die when i try to achieve what i want? >>> >>> the use case is quite simple: all products with a certain word (10 cheapest or so). >>> >>> is there any alternative approach to this? >>> i was putting some hope into KNN but it seems it needs too much random I/O :(. >>> >>> many thanks, >>> >>> hans >>> >>> -- >>> Cybertec Sch?nig & Sch?nig GmbH >>> Gr?hrm?hlgasse 26 >>> A-2700 Wiener Neustadt, Austria >>> Web: http://www.postgresql-support.de >>> >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > -- > Cybertec Sch?nig & Sch?nig GmbH > Gr?hrm?hlgasse 26 > A-2700 Wiener Neustadt, Austria > Web: http://www.postgresql-support.de > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > what if you create index (price,title) ? I think that SELECT ... WHERE ... ORDER BY ... LIMIT is basically an intractable problem. We've recognized the difficulty in connection with btree indexes for a long time, and there is no reason at all to think that KNNGist will somehow magically dodge it. You can either visit *all* of the rows satisfying WHERE (and then sort them), or you can visit the rows in ORDER BY order and hope that you find enough of them satisfying the WHERE in a reasonable amount of time. Either of these strategies loses badly in many real-world cases. Maybe with some sort of fuzzy notion of ordering it'd be possible to go faster, but as long as you insist on an exact ORDER BY result, I don't see any way out of it. One way to be fuzzy is to introduce a maximum search distance: SELECT ... WHERE x < limit AND other-conditions ORDER BY x LIMIT n which essentially works by limiting the damage in the visit-all-the-rows approach. Hans didn't do that in his example, but I wonder how much it'd help (and whether the existing GIST support is adequate for it). regards, tom lane
hello ... i got that one ... "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price) so, i have a combined index on text + number. to me the plan seems fine ... it looks like a prober KNN traversal. the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or so init (1 mio out of 11 mio or so). you are moving out from one specific place. my maths is like that:11 mio in total1 mio matching "iphone"cheapest / most expensive 10 out of this mio needed. operator classes are all nice and in place: SELECT 10 <-> 4 as distance;distance ---------- 6 (1 row) what does "buffers true" in your case say? many thanks, hans On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote: > Probably, you miss two-columnt index. From my early post: > http://www.sai.msu.su/~megera/wiki/knngist > > =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address)); > =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; > id | address | dist ---------+-------------------------------------------------------------+--------------------- > 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05 > 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406 > 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288 > 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288 > 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896 > 1923818 | Champ de Mars Paris, France | 0.00838214733539654 > 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 > 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 > 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339 > 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995 > (10 rows) > > Time: 7.859 ms > > =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point > AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Limit > -> Index Scan using spots_idx on spots > Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery)) > (3 rows) > > > On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: > >> hello all ... >> >> given oleg's posting before i also wanted to fire up some KNN related question. >> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. >> i did some tests: >> >> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; >> QUERY PLAN >> >> ----------------------------------------------------------------------------------------------------------------------------- >> -------------------------------------- >> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1) >> Buffers: shared hit=9 read=5004 >> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actualtime= >> 36391.715..45542.573 rows=10 loops=1) >> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) >> Order By: (int_price <-> 0::bigint) >> Buffers: shared hit=9 read=5004 >> Total runtime: 45542.676 ms >> (7 rows) >> >> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; >> QUERY PLAN >> >> ----------------------------------------------------------------------------------------------------------------------------- >> ------------------------------------- >> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1) >> Buffers: shared hit=3 read=2316 >> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >> 7243.524..10935.217 rows=10 loops=1) >> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >> Order By: (int_price <-> 0::bigint) >> Buffers: shared hit=3 read=2316 >> Total runtime: 10935.265 ms >> (7 rows) >> >> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; >> QUERY PLAN >> >> ----------------------------------------------------------------------------------------------------------------------------- >> ------------------------------- >> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) >> Buffers: shared hit=1 read=1577 >> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >> 28.525..28.525 rows=1 loops=1) >> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >> Order By: (int_price <-> 0::bigint) >> Buffers: shared hit=1 read=1577 >> Total runtime: 28.558 ms >> (7 rows) >> >> >> under any circumstances - there is no way to reduce the number of buffers needed for a query like that. >> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. >> is there any alternative which does not simply die when i try to achieve what i want? >> >> the use case is quite simple: all products with a certain word (10 cheapest or so). >> >> is there any alternative approach to this? >> i was putting some hope into KNN but it seems it needs too much random I/O :(. >> >> many thanks, >> >> hans >> >> -- >> Cybertec Sch?nig & Sch?nig GmbH >> Gr?hrm?hlgasse 26 >> A-2700 Wiener Neustadt, Austria >> Web: http://www.postgresql-support.de >> >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
Oops, my previous example was fromm early prototype :) I just recreated test environment for 9.1: knn=# select count(*) from spots; count -------- 908846 (1 row) knn=# explain (analyze true, buffers true) SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point)AS dist FROM spots WHERE to_tsvector('french',address) @@ to_tsquery('french','mars') ORDER BY coordinates <-> '(2.29470491409302,48.858263472125)'::point LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..33.63 rows=10 width=58) (actual time=1.541..1.875 rows=10 loops=1) Buffers: shared hit=251 -> IndexScan using spots_idx on spots (cost=0.00..15279.12 rows=4544 width=58) (actual time=1.540..1.874 rows=10 loops=1) Index Cond: (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery) Order By: (coordinates <-> '(2.29470491409302,48.858263472125)'::point) Buffers: shared hit=251 Total runtime: 1.905 ms (7 rows) Time: 2.372 ms On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: > hello ... > > i got that one ... > > "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price) > > so, i have a combined index on text + number. > to me the plan seems fine ... it looks like a prober KNN traversal. > the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or soin it (1 mio out of 11 mio or so). you are moving out from one specific place. > > my maths is like that: > 11 mio in total > 1 mio matching "iphone" > cheapest / most expensive 10 out of this mio needed. > > operator classes are all nice and in place: > > SELECT 10 <-> 4 as distance; > distance > ---------- > 6 > (1 row) > > what does "buffers true" in your case say? > > many thanks, > > hans > > > On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote: > >> Probably, you miss two-columnt index. From my early post: >> http://www.sai.msu.su/~megera/wiki/knngist >> >> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address)); >> =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; >> id | address | dist ---------+-------------------------------------------------------------+--------------------- >> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05 >> 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406 >> 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288 >> 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288 >> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896 >> 1923818 | Champ de Mars Paris, France | 0.00838214733539654 >> 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 >> 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 >> 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339 >> 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995 >> (10 rows) >> >> Time: 7.859 ms >> >> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point >> AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; >> >> QUERY PLAN >> -------------------------------------------------------------------------------------------------------------------------------------- >> Limit >> -> Index Scan using spots_idx on spots >> Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery)) >> (3 rows) >> >> >> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: >> >>> hello all ... >>> >>> given oleg's posting before i also wanted to fire up some KNN related question. >>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. >>> i did some tests: >>> >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; >>> QUERY PLAN >>> >>> ----------------------------------------------------------------------------------------------------------------------------- >>> -------------------------------------- >>> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1) >>> Buffers: shared hit=9 read=5004 >>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actualtime= >>> 36391.715..45542.573 rows=10 loops=1) >>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) >>> Order By: (int_price <-> 0::bigint) >>> Buffers: shared hit=9 read=5004 >>> Total runtime: 45542.676 ms >>> (7 rows) >>> >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; >>> QUERY PLAN >>> >>> ----------------------------------------------------------------------------------------------------------------------------- >>> ------------------------------------- >>> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1) >>> Buffers: shared hit=3 read=2316 >>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >>> 7243.524..10935.217 rows=10 loops=1) >>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >>> Order By: (int_price <-> 0::bigint) >>> Buffers: shared hit=3 read=2316 >>> Total runtime: 10935.265 ms >>> (7 rows) >>> >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; >>> QUERY PLAN >>> >>> ----------------------------------------------------------------------------------------------------------------------------- >>> ------------------------------- >>> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) >>> Buffers: shared hit=1 read=1577 >>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >>> 28.525..28.525 rows=1 loops=1) >>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >>> Order By: (int_price <-> 0::bigint) >>> Buffers: shared hit=1 read=1577 >>> Total runtime: 28.558 ms >>> (7 rows) >>> >>> >>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that. >>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. >>> is there any alternative which does not simply die when i try to achieve what i want? >>> >>> the use case is quite simple: all products with a certain word (10 cheapest or so). >>> >>> is there any alternative approach to this? >>> i was putting some hope into KNN but it seems it needs too much random I/O :(. >>> >>> many thanks, >>> >>> hans >>> >>> -- >>> Cybertec Sch?nig & Sch?nig GmbH >>> Gr?hrm?hlgasse 26 >>> A-2700 Wiener Neustadt, Austria >>> Web: http://www.postgresql-support.de >>> >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > -- > Cybertec Sch?nig & Sch?nig GmbH > Gr?hrm?hlgasse 26 > A-2700 Wiener Neustadt, Austria > Web: http://www.postgresql-support.de > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
price has a problem :(. "iphone" can be a 20 cents bag or a sticker or a 900 euro thing signed by whoever ... so, words and the sort-number / price are not related in anyway. price is in this case no way to narrow down the problem(e.g. evaluate first or so). many thanks, hans On Apr 8, 2011, at 5:25 PM, Oleg Bartunov wrote: > Hans, > > what if you create index (price,title) ? > > > On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: > >> hello ... >> >> i got that one ... >> >> "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price) >> >> so, i have a combined index on text + number. >> to me the plan seems fine ... it looks like a prober KNN traversal. >> the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or soin it (1 mio out of 11 mio or so). you are moving out from one specific place. >> >> my maths is like that: >> 11 mio in total >> 1 mio matching "iphone" >> cheapest / most expensive 10 out of this mio needed. >> >> operator classes are all nice and in place: >> >> SELECT 10 <-> 4 as distance; >> distance >> ---------- >> 6 >> (1 row) >> >> what does "buffers true" in your case say? >> >> many thanks, >> >> hans >> >> >> On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote: >> >>> Probably, you miss two-columnt index. From my early post: >>> http://www.sai.msu.su/~megera/wiki/knngist >>> >>> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address)); >>> =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; >>> id | address | dist ---------+-------------------------------------------------------------+--------------------- >>> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05 >>> 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406 >>> 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288 >>> 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288 >>> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896 >>> 1923818 | Champ de Mars Paris, France | 0.00838214733539654 >>> 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 >>> 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 >>> 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339 >>> 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995 >>> (10 rows) >>> >>> Time: 7.859 ms >>> >>> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point >>> AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; >>> >>> QUERY PLAN >>> -------------------------------------------------------------------------------------------------------------------------------------- >>> Limit >>> -> Index Scan using spots_idx on spots >>> Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery)) >>> (3 rows) >>> >>> >>> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: >>> >>>> hello all ... >>>> >>>> given oleg's posting before i also wanted to fire up some KNN related question. >>>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. >>>> i did some tests: >>>> >>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; >>>> QUERY PLAN >>>> >>>> ----------------------------------------------------------------------------------------------------------------------------- >>>> -------------------------------------- >>>> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1) >>>> Buffers: shared hit=9 read=5004 >>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actualtime= >>>> 36391.715..45542.573 rows=10 loops=1) >>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) >>>> Order By: (int_price <-> 0::bigint) >>>> Buffers: shared hit=9 read=5004 >>>> Total runtime: 45542.676 ms >>>> (7 rows) >>>> >>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; >>>> QUERY PLAN >>>> >>>> ----------------------------------------------------------------------------------------------------------------------------- >>>> ------------------------------------- >>>> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1) >>>> Buffers: shared hit=3 read=2316 >>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >>>> 7243.524..10935.217 rows=10 loops=1) >>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >>>> Order By: (int_price <-> 0::bigint) >>>> Buffers: shared hit=3 read=2316 >>>> Total runtime: 10935.265 ms >>>> (7 rows) >>>> >>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; >>>> QUERY PLAN >>>> >>>> ----------------------------------------------------------------------------------------------------------------------------- >>>> ------------------------------- >>>> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) >>>> Buffers: shared hit=1 read=1577 >>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= >>>> 28.525..28.525 rows=1 loops=1) >>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) >>>> Order By: (int_price <-> 0::bigint) >>>> Buffers: shared hit=1 read=1577 >>>> Total runtime: 28.558 ms >>>> (7 rows) >>>> >>>> >>>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that. >>>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. >>>> is there any alternative which does not simply die when i try to achieve what i want? >>>> >>>> the use case is quite simple: all products with a certain word (10 cheapest or so). >>>> >>>> is there any alternative approach to this? >>>> i was putting some hope into KNN but it seems it needs too much random I/O :(. >>>> >>>> many thanks, >>>> >>>> hans >>>> >>>> -- >>>> Cybertec Sch?nig & Sch?nig GmbH >>>> Gr?hrm?hlgasse 26 >>>> A-2700 Wiener Neustadt, Austria >>>> Web: http://www.postgresql-support.de >>>> >>>> >>>> >>> >>> Regards, >>> Oleg >>> _____________________________________________________________ >>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >>> Sternberg Astronomical Institute, Moscow University, Russia >>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>> phone: +007(495)939-16-83, +007(495)939-23-83 >>> >>> -- >>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-hackers >>> >> >> -- >> Cybertec Sch?nig & Sch?nig GmbH >> Gr?hrm?hlgasse 26 >> A-2700 Wiener Neustadt, Austria >> Web: http://www.postgresql-support.de >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de