Обсуждение: BUG #8943: optimizer appears to not be efficient when there is little data in the query results
BUG #8943: optimizer appears to not be efficient when there is little data in the query results
От
nghia.le@postano.com
Дата:
The following bug has been logged on the website: Bug reference: 8943 Logged by: Nghia Le Email address: nghia.le@postano.com PostgreSQL version: 9.3.1 Operating system: Amazon RDS Description: I think the issue is The issue is we order by post_time, scraped time. and for the most part when there is a lot of data, the composite index created works wonders. However when there is little data (ie feed_id =8924 ) has about 54 items. Then it doesn't know to just use a sequence scan, instead it uses the entire index and takes forever in doing so. Query Plan with Index_scan off: Limit (cost=154847.18..154847.18 rows=1 width=684) (actual time=1.001..1.002 rows=1 loops=1) Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time Buffers: shared hit=119 read=4 -> Sort (cost=154847.18..154877.28 rows=12038 width=684) (actual time=0.998..0.998 rows=1 loops=1) Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time Sort Key: p.post_time, p.scraped_time Sort Method: top-N heapsort Memory: 26kB Buffers: shared hit=119 read=4 -> Nested Loop Anti Join (cost=294.53..154786.99 rows=12038 width=684) (actual time=0.073..0.927 rows=23 loops=1) Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time Buffers: shared hit=119 read=4 -> Bitmap Heap Scan on public.post p (cost=290.11..48544.89 rows=12586 width=684) (actual time=0.042..0.583 rows=23 loops=1) Output: p.id, p.guid, p.scraped_time, p.source_type, p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id, p.tsv, p.original_source_id Recheck Cond: (p.feed_id = 8924) Buffers: shared hit=5 read=3 -> Bitmap Index Scan on feed_id_idx (cost=0.00..286.96 rows=12586 width=0) (actual time=0.033..0.033 rows=23 loops=1) Index Cond: (p.feed_id = 8924) Buffers: shared hit=3 read=1 -> Bitmap Heap Scan on public.post p1 (cost=4.42..8.44 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=23) Output: p1.id, p1.guid, p1.scraped_time, p1.source_type, p1.post_time, p1.source_id, p1.title, p1.url, p1.picture_url, p1.video_url, p1.media_mime_type, p1.media_height, p1.media_width, p1.text, p1.user_icon_url, p1.user_id, p1.user_displayname, p1.user_fullname, p1.text_config_name, p1.feed_id, p1.tsv, p1.original_source_id Recheck Cond: (p1.id = p.id) Filter: ((p1.original_source_id IS NOT NULL) AND ((p1.source_type)::text = ANY ('{twitter_stream,twitter_search,twitter_lists,twitter_user,twitter_hashtag}'::text[]))) Rows Removed by Filter: 1 Buffers: shared hit=114 read=1 -> Bitmap Index Scan on post_pkey (cost=0.00..4.42 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=23) Index Cond: (p1.id = p.id) Buffers: shared hit=91 read=1 Total runtime: 1.078 ms --------- Query Scan with Index On: Limit (cost=1.13..61.38 rows=5 width=684) (actual time=0.058..0.110 rows=5 loops=1) Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time Buffers: shared hit=28 read=3 -> Nested Loop Anti Join (cost=1.13..144978.35 rows=12032 width=684) (actual time=0.055..0.098 rows=5 loops=1) Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time Buffers: shared hit=28 read=3 -> Index Scan using feed_id_idx on public.post p (cost=0.56..38902.40 rows=12580 width=684) (actual time=0.028..0.043 rows=5 loops=1) Output: p.id, p.guid, p.scraped_time, p.source_type, p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id, p.tsv, p.original_source_id Index Cond: (p.feed_id = 8924) Buffers: shared hit=4 read=2 -> Index Scan using post_pkey on public.post p1 (cost=0.56..8.43 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=5) Output: p1.id, p1.guid, p1.scraped_time, p1.source_type, p1.post_time, p1.source_id, p1.title, p1.url, p1.picture_url, p1.video_url, p1.media_mime_type, p1.media_height, p1.media_width, p1.text, p1.user_icon_url, p1.user_id, p1.user_displayname, p1.user_fullname, p1.text_config_name, p1.feed_id, p1.tsv, p1.original_source_id Index Cond: (p1.id = p.id) Filter: ((p1.original_source_id IS NOT NULL) AND ((p1.source_type)::text = ANY ('{twitter_stream,twitter_search,twitter_lists,twitter_user,twitter_hashtag}'::text[]))) Rows Removed by Filter: 1 Buffers: shared hit=24 read=1 Total runtime: 0.186 ms original query: explain(analyze,buffers,verbose) SELECT p.id, p.guid, source_type, post_time, source_id, title, url, picture_url, video_url, media_mime_type, media_height, media_width, text, user_icon_url, user_id, user_displayname, user_fullname, text_config_name, feed_id,scraped_time AS t FROM post AS p LEFT JOIN location AS l on l.post_id=p.id WHERE ( feed_id =8924 ) AND NOT EXISTS( SELECT 1 FROM post p1 WHERE p1.id=p.id AND p1.source_type IN('twitter_stream','twitter_search', 'twitter_lists', 'twitter_user', 'twitter_hashtag') AND p1.original_source_id IS NOT NULL) ORDER BY post_time DESC, scraped_time DESC LIMIT 1;