Hello, we have some confusion over the planner's use of an index.
Suppose we have a table "parades" with columns:
"city_id" of type integer
"description" of type text
"start_time" of type timestamp without time zone
Suppose also we have indexes:
"parades_city_id_description_tsv_index" gin (city_id, to_tsvector('simple'::regconfig, description)) WHERE description IS NOT NULL
"parades_city_id_start_time_index" btree (city_id, start_time)
When we EXPLAIN the query
SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND (to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND ("city_id" IN (<roughly 50 ids>)));
We get
Bitmap Heap Scan on parades (cost=12691.97..18831.21 rows=2559 width=886)
Recheck Cond: ((to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery) AND (description IS NOT NULL) AND (city_id = ANY ('{<roughly 50 ids>}'::integer[])))
-> BitmapAnd (cost=12691.97..12691.97 rows=2559 width=0)
-> Bitmap Index Scan on parades_city_id_description_tsv_index (cost=0.00..2902.97 rows=229463 width=0)
Index Cond: (to_tsvector('simple'::regconfig, title) @@ '''fun'''::tsquery)
-> Bitmap Index Scan on parades_city_id_start_time_index (cost=0.00..9787.47 rows=565483 width=0)
Index Cond: (city_id = ANY ('{<roughly 50 ids>}'::integer[]))
When we EXPLAIN the same query but with one city_id
SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND (to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND ("city_id" IN (1)));
We get
Bitmap Heap Scan on parades (cost=36.20..81.45 rows=20 width=886)
Recheck Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery) AND (description IS NOT NULL))
-> Bitmap Index Scan on parades_city_id_description_tsv_index (cost=0.00..36.20 rows=20 width=0)
Index Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery))
This leaves us with two questions:
1. How is postgres able to use parades_city_id_description_tsv_index in the first explain result without any filter on "city_id"?
2. Why does the planner in the first query decide not to simply use parades_city_id_description_tsv_index (as in the second explain result) when the cardinality of the set of "city_id"s is high?
Thanks,
Jared