Обсуждение: Crazy looking actual row count from explain analyze
Running 8.4.3, I have a table with 43 million rows. Two of the columns are (topic_id int not null) and (status message_status_enum not null), where message_status_enum is defined as CREATE TYPE message_status_enum AS ENUM ( 'V', 'X', 'S', 'R', 'U', 'D' ); Among the indexes there is this: "m_20100201_topic_multi" btree (topic_id, status, source_category_id, alg_ci_rank_rollup) ..see that topic_id and status are the leading edge of the index. Fact: there are no rows with status 'S' or 'X' Fact: there are no rows with topic_id = 1 Consider, then... explain analyze select count(*) from m_20100201 where status in ('S','X'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=987810.75..987810.76 rows=1 width=0) (actual time=2340.193..2340.194 rows=1 loops=1) -> Bitmap Heap Scan on m_20100201 (cost=987806.75..987810.75 rows=1 width=0) (actual time=2340.191..2340.191 rows=0 loops=1) Recheck Cond: (status = ANY ('{S,X}'::message_status_enum[])) -> Bitmap Index Scan on m_20100201_topic_multi (cost=0.00..987806.75 rows=1 width=0) (actual time=2334.371..2334.371 rows=126336 loops=1) Index Cond: (status = ANY ('{S,X}'::message_status_enum[])) What I don't understand is the "actual rows" of 126,336 in the bitmap index scan. I would expect it to have to scan every index entry, but doesn't this output mean that it's *returning* 126K rows from that scan? Whereas I think it should return zero. I have already fixed this query by adding a better index. But the point of this post is simply to understand this explain analyze output. Thanks! --gordon -- View this message in context: http://old.nabble.com/Crazy-looking-actual-row-count-from-explain-analyze-tp28517643p28517643.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Gordon Shannon <gordo169@gmail.com> writes: > -> Bitmap Heap Scan on m_20100201 (cost=987806.75..987810.75 rows=1 > width=0) (actual time=2340.191..2340.191 rows=0 loops=1) > Recheck Cond: (status = ANY ('{S,X}'::message_status_enum[])) > -> Bitmap Index Scan on m_20100201_topic_multi > (cost=0.00..987806.75 rows=1 width=0) (actual time=2334.371..2334.371 > rows=126336 loops=1) > Index Cond: (status = ANY ('{S,X}'::message_status_enum[])) > What I don't understand is the "actual rows" of 126,336 in the bitmap index > scan. I would expect it to have to scan every index entry, but doesn't this > output mean that it's *returning* 126K rows from that scan? Whereas I think > it should return zero. Well, it does return zero rows from the actual heapscan. What the above is telling you is that a whole lot of rows are being returned by the index and then filtered out at the table scan stage. My first suspicion is that those are unvacuumed dead rows ... what's your vacuuming policy on this database? > I have already fixed this query by adding a better index. I think the new index might have "fixed" things largely by not bothering to index already-dead rows. regards, tom lane
Tom Lane-2 wrote: > > My first suspicion > is that those are unvacuumed dead rows ... what's your vacuuming policy > on this database? > Ah, I didn't know that number included dead tuples. That probably explains it. pg_stat_user_tables says the table has 370,269 dead tuples. On this table, I have autovacuum_vacuum_scale_factor set to 0.02, so I believe the table will have to have 869K dead tuples before vacuum will kick in. > I have already fixed this query by adding a better index. Tom Lane-2 wrote: > > I think the new index might have "fixed" things largely by not bothering > to index already-dead rows. > Actually, I put a partial index on status, where != 'V'. That fits our usage pattern of 99% of the records being 'V', so it's a tiny index and satisifies this type of query very quickly. Thanks, --gordon -- View this message in context: http://old.nabble.com/Crazy-looking-actual-row-count-from-explain-analyze-tp28517643p28518862.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.