Обсуждение: any way to get rid of Bitmap Heap Scan recheck?
Hi. I have the following join condition in a query "posttag inner join tag ON posttag.tagid = tag.id and tag.name = 'blah'" tag.id is PK, I have indexes on posttag.tagid and tag.name both created with all the options set to default. PG version is 8.1. The query is very slow (3 minutes on test data), here's what takes all the time, from explain results: > Bitmap Heap Scan on tag (cost=897.06..345730.89 rows=115159 width=8) Recheck Cond: ((name)::text = 'blah'::text) -> Bitmap Index Scan on tag_idxn (cost=0.00..897.06 rows=115159 width=0) Index Cond: ((name)::text = 'blah'::text) What is recheck? I googled some and found something about lossy indexes but no fixes for this issue. The only reason I ever have this index is to do joins like this one; how do I make it not lossy? If I cannot make it not lossy, is there any way to make it skip recheck and say to hell with the losses? :) The query without recheck will run like up to 100 times faster according to overall query plan. I'm pondering encoding the tag name to int or bytea field(s) and joining on them but that's kinda ugly.
Any ideas?
Sergei Shelukhin wrote: > Hi. > I have the following join condition in a query > "posttag inner join tag ON posttag.tagid = tag.id and tag.name = > 'blah'" > tag.id is PK, I have indexes on posttag.tagid and tag.name both > created with all the options set to default. > PG version is 8.1. > > > The query is very slow (3 minutes on test data), here's what takes all > the time, from explain results: > >> Bitmap Heap Scan on tag (cost=897.06..345730.89 rows=115159 width=8) > Recheck Cond: ((name)::text = 'blah'::text) > -> Bitmap Index Scan on tag_idxn > (cost=0.00..897.06 rows=115159 width=0) > Index Cond: ((name)::text = > 'blah'::text) > > What is recheck? I googled some and found something about lossy > indexes but no fixes for this issue. > The only reason I ever have this index is to do joins like this one; > how do I make it not lossy? > > If I cannot make it not lossy, is there any way to make it skip > recheck and say to hell with the losses? :) > The query without recheck will run like up to 100 times faster > according to overall query plan. A bitmapped index scan works in two stages. First the index or indexes are scanned to create a bitmap representing matching tuples. That shows up as Bitmap Index Scan in explain. Then all the matching tuples are fetched from the heap, that's the Bitmap Heap Scan. If the bitmap is larger than work_mem (because there's a lot of matching tuples), it's stored in memory as lossy. In lossy mode, we don't store every tuple in the bitmap, but each page with any matching tuples on it is represented as a single bit. When performing the Bitmap Heap Scan phase with a lossy bitmap, the pages need to be scanned, using the Recheck condition, to see which tuples match. The Recheck condition is always shown, even if the bitmap is not stored as lossy and no rechecking is done. Now let's get to your situation. The problem is almost certainly not the rechecking or lossy bitmaps, but you can increase your work_mem to make sure. I'd suggest you do the usual drill: ANALYZE all relevant tables. If that doesn't solve the problem, run EXPLAIN ANALYZE instead of just EXPLAIN. See if you can figure something out of that, and if you need more help, send the output back to the list together with the table definitions and indexes of all tables involved in the query. > I'm pondering encoding the tag name to int or bytea field(s) and > joining on them but that's kinda ugly. I doubt that helps, but it's hard to say without seeing the schema. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Sergei Shelukhin <realgeek@gmail.com> writes: > The query is very slow (3 minutes on test data), here's what takes all > the time, from explain results: >> Bitmap Heap Scan on tag (cost=897.06..345730.89 rows=115159 width=8) > Recheck Cond: ((name)::text = 'blah'::text) > -> Bitmap Index Scan on tag_idxn > (cost=0.00..897.06 rows=115159 width=0) > Index Cond: ((name)::text = > 'blah'::text) It's usually a good idea to do EXPLAIN ANALYZE on troublesome queries, rather than trusting that the planner's estimates reflect reality. > The query without recheck will run like up to 100 times faster > according to overall query plan. Sorry, but you have no concept what you're talking about. The difference between indexscan and heap scan estimates here reflects fetching rows from the heap, not recheck costs. Even if it were a good idea to get rid of the recheck (which it is not), it wouldn't reduce the costs materially. If the table is fairly static then it might help to CLUSTER on that index, so that the rows needed are brought together physically. regards, tom lane
Sergei Shelukhin wrote: > Hi. > I have the following join condition in a query > "posttag inner join tag ON posttag.tagid = tag.id and tag.name = > 'blah'" > tag.id is PK, I have indexes on posttag.tagid and tag.name both > created with all the options set to default. > PG version is 8.1. > > > The query is very slow (3 minutes on test data), here's what takes all > the time, from explain results:Any ideas? Yes, post the output of explain analyze select ... (rest of query here) for starters