GIN internal query-plan alternatives.

Поиск
Список
Период
Сортировка
От jesper@krogh.cc
Тема GIN internal query-plan alternatives.
Дата
Msg-id 2f73f1348320af71b0551e2934d588ca.squirrel@shrek.krogh.cc
обсуждение исходный текст
Список pgsql-hackers
Hi

I have a "feeling" that the internal query-plans (or alternative)
query-plans when executing GIN-searches are not being exhausted as much as
they generally are in PG.

More specifically a query like:

select id from table where fts @@ to_tsquery('english','verycommon &
veryrare');

Can do an search on the "very rare" and postfilter it on the
very-common keyword. I had problems trying to force the query-planner into
executing a query that forced that behavior, but here is my attempt:


2011-11-30 11:33:41.010 testdb=# explain analyze select id from
testdb.testtable where id in (select id from testdb.testtable where fts @@
pptsquery('veryrare')) and fts @@ pptsquery('verycommon') order by id desc
limit 300;                                                                       QUERY
PLAN

























----------------------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=14203.33..14204.08 rows=300 width=4) (actual 
time=62.561..62.567 rows=14 loops=1)  ->  Sort  (cost=14203.33..14206.91 rows=1430 width=4) (actual
time=62.561..62.566 rows=14 loops=1)        Sort Key: testdb.testtable.id        Sort Method:  quicksort  Memory: 25kB
     ->  Nested Loop  (cost=4799.17..14137.35 rows=1430 width=4) 
(actual time=11.225..62.530 rows=14 loops=1)              ->  HashAggregate  (cost=4799.17..4813.47 rows=1430
width=4) (actual time=6.792..7.941 rows=2409 loops=1)                    ->  Bitmap Heap Scan on testtable
(cost=1943.29..4795.59 rows=1430 width=4) (actual
time=0.962..5.174 rows=2409 loops=1)                          Recheck Cond: (fts @@ '''veryrare'''::tsquery)
->  Bitmap Index Scan on testtable_gin_idx
(cost=0.00..1942.93 rows=1430 width=0) (actual
time=0.635..0.635 rows=2419 loops=1)                                Index Cond: (fts @@
'''veryrare'''::tsquery)              ->  Index Scan using testtable_pkey on testtable
(cost=0.00..6.51 rows=1 width=4) (actual time=0.022..0.022
rows=0 loops=2409)                    Index Cond: (testdb.testtable.id =
testdb.testtable.id)                    Filter: (testdb.testtable.fts @@
'''verycommon'''::tsquery)Total runtime: 62.679 ms
(14 rows)

Time: 125.899 ms
2011-11-30 11:40:59.673 testdb=# explain analyze select id from
testdb.testtable where fts @@ pptsquery('verycommon veryrare') order by id
desc limit 300;      QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=2522.12..2522.87 rows=300 width=4) (actual 
time=1282.967..1282.972 rows=14 loops=1)  ->  Sort  (cost=2522.12..2523.88 rows=704 width=4) (actual
time=1282.965..1282.968 rows=14 loops=1)        Sort Key: id        Sort Method:  quicksort  Memory: 25kB        ->
BitmapHeap Scan on testtable  (cost=1081.67..2489.63 
rows=704 width=4) (actual time=1282.902..1282.948 rows=14
loops=1)              Recheck Cond: (fts @@ '''verycommon'' &
''veryrare'''::tsquery)              ->  Bitmap Index Scan on testtable_gin_idx
(cost=0.00..1081.49 rows=704 width=0) (actual
time=1282.880..1282.880 rows=17 loops=1)                    Index Cond: (fts @@ '''verycommon'' &
''veryrare'''::tsquery)Total runtime: 1283.274 ms
(9 rows)

Time: 1300.587 ms
2011-11-30 11:41:13.217 testdb=#

This may of-course not always be the optimal query-plan, but in this
situation the alternative plan is roughly 20-times better.
(both queries cached).

The in-clause is not "the best way", a regular filter on the results of
the veryrare-term would be natural, but the query-planner cleverly
collapses that to be the exact same thing.

This can be stressed by just adding the same "common" keyword several time
to the GIN-search, where the query, even producing the same results gets
slower and slower:

2011-11-30 11:51:10.239 testdb=# select count(id) from testdb.testtable
where id in (select id from testdb.testtable where fts @@
pptsquery('veryrare')) and fts @@ pptsquery('verycommon verycommon
verycommon');count
-------   14
(1 row)


Time: 90.389 ms
2011-11-30 11:51:16.777 testdb=# select count(id) from testdb.testtable
where fts @@ to_tsquery('english','veryrare & verycommon & verycommon &
verycommon');count
-------   14
(1 row)

Time: 2207.125 ms

PG does a lot of query-rewriting and testing different query-plans, there
seems to be room for improvements here. I'll craft a test-dataset
where it can be reproduced. Then someone with internal PG knowledge can
tell me if it just isn't implemented or I might have some configuration
option that prevents my system from doing it correctly.

--
Jesper






В списке pgsql-hackers по дате отправления:

Предыдущее
От: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Дата:
Сообщение: Re: synchronous commit vs. hint bits
Следующее
От: Joel Jacobson
Дата:
Сообщение: Java LISTEN/NOTIFY client library work-around