Обсуждение: Interesting new bug?
Allright, I'm running 7.0.2 with Tom Lane's backwards index scan patch applied. I'm attempting to select out of a large table (10GB) with about 4 million rows, and it winds up just sitting and doing "nothing" forever. If I check the process list, I see it using about 9% of the CPU. This table is vacuum analyzed nightly - here's a description and EXPLAIN from the query I'm trying to run. Any ideas? I haven't been able to run the admin pages on Geocrawler ever since I upgraded to 7.0.2 Tim db_geocrawler=# \d tbl_mail_archive Table "tbl_mail_archive" Attribute | Type | Modifier ----------------------+----------+----------------------------------------------fld_mailid | integer | not nulldefault nextval('seq_mailid'::text)fld_mail_list | integer | fld_mail_date | char(14) | fld_mail_is_followup | integer | fld_mail_from | text | fld_mail_subject | text | fld_mail_body | text | fld_mail_email | text | fld_mail_year | integer | fld_mail_month | integer | Indices: idx_archive_list, idx_archive_list_date, idx_archive_year, idx_mail_archive_list_yr_mo, tbl_mail_archive_pkey I'm manually deleting the rows without knowing what they are - and that's bad - this query shows that the rows do exist, but for some reason you can't select them out of the db. db_geocrawler=# begin; BEGIN db_geocrawler=# delete from tbl_mail_archive where fld_mail_list=0; DELETE 1032 db_geocrawler=# delete from tbl_mail_chunks where fld_mail_list=0; DELETE 39 db_geocrawler=# commit; COMMIT db_geocrawler=# explain SELECT * FROM tbl_mail_archive WHERE fld_mail_list=0 ORDER BY fld_mailid ASC LIMIT 10 OFFSET 0; NOTICE: QUERY PLAN: Index Scan using tbl_mail_archive_pkey on tbl_mail_archive (cost=0.00..6402391.68 rows=19357 width=80) EXPLAIN -- Founder - PHPBuilder.com / Geocrawler.com Lead Developer - SourceForge VA Linux Systems 408-542-5723
Tim Perdue <tim@sourceforge.net> writes: > I'm attempting to select out of a large table (10GB) with about 4 > million rows, and it winds up just sitting and doing "nothing" forever. > db_geocrawler=# explain SELECT * FROM tbl_mail_archive WHERE > fld_mail_list=0 ORDER BY fld_mailid ASC LIMIT 10 OFFSET 0; > NOTICE: QUERY PLAN: > Index Scan using tbl_mail_archive_pkey on tbl_mail_archive > (cost=0.00..6402391.68 rows=19357 width=80) Interesting. Since there's no explicit sort in the plan, I infer that index tbl_mail_archive_pkey is on fld_mailid, meaning that the indexscan yields data already sorted by fld_mailid --- otherwise a sort step would be needed. Evidently the optimizer is guessing that "scan in fld_mailid order until you have 10 rows where fld_mail_list=0" is faster than "find all rows with fld_mail_list=0 and then sort by fld_mailid". Since you're complaining, I guess that this is not so :-( ... but I'm not sure how the optimizer might be taught to guess that. What exactly are the indexes *on* here; how many rows are in the table; and how many rows satisfy fld_mail_list=0? regards, tom lane
Tom Lane wrote: > > Tim Perdue <tim@sourceforge.net> writes: > > I'm attempting to select out of a large table (10GB) with about 4 > > million rows, and it winds up just sitting and doing "nothing" forever. > > > db_geocrawler=# explain SELECT * FROM tbl_mail_archive WHERE > > fld_mail_list=0 ORDER BY fld_mailid ASC LIMIT 10 OFFSET 0; > > NOTICE: QUERY PLAN: > > > Index Scan using tbl_mail_archive_pkey on tbl_mail_archive > > (cost=0.00..6402391.68 rows=19357 width=80) > > Interesting. Since there's no explicit sort in the plan, I infer that > index tbl_mail_archive_pkey is on fld_mailid, meaning that the indexscan > yields data already sorted by fld_mailid --- otherwise a sort step would > be needed. Evidently the optimizer is guessing that "scan in fld_mailid > order until you have 10 rows where fld_mail_list=0" is faster than > "find all rows with fld_mail_list=0 and then sort by fld_mailid". > > Since you're complaining, I guess that this is not so :-( ... but I'm > not sure how the optimizer might be taught to guess that. What exactly > are the indexes *on* here; how many rows are in the table; and how many > rows satisfy fld_mail_list=0? There is an index on fld_mail_list and there were 1093 rows that matched out of about 4.1 million. I wonder if this is the same problem we had before where I need to order by fld_mail_list, fld_mailid instead of just on fld_mailid. If so, you need to get that fixed in the optimizer. db_geocrawler=# explain db_geocrawler-# SELECT * FROM tbl_mail_archive WHERE db_geocrawler-# fld_mail_list=0 ORDER BY fld_mail_list ASC,fld_mailid ASC LIMIT 10 OFFSET 0; NOTICE: QUERY PLAN: Sort (cost=78282.54..78282.54 rows=19357 width=80) -> Index Scan using idx_archive_list on tbl_mail_archive (cost=0.00..76904.24 rows=19357 width=80) EXPLAIN Notice how it is now using the right index, because I am doing a sort on fld_mail_list first. Tim -- Founder - PHPBuilder.com / Geocrawler.com Lead Developer - SourceForge VA Linux Systems 408-542-5723
What did you think of this? I fixed my problem by changing my query - but I shouldn't have had to. This looks like a weakness in your optimizer, having to first sort on criteria that you don't care about. Tim Tim Perdue wrote: > > Tom Lane wrote: > > > > Tim Perdue <tim@sourceforge.net> writes: > > > I'm attempting to select out of a large table (10GB) with about 4 > > > million rows, and it winds up just sitting and doing "nothing" forever. > > > > > db_geocrawler=# explain SELECT * FROM tbl_mail_archive WHERE > > > fld_mail_list=0 ORDER BY fld_mailid ASC LIMIT 10 OFFSET 0; > > > NOTICE: QUERY PLAN: > > > > > Index Scan using tbl_mail_archive_pkey on tbl_mail_archive > > > (cost=0.00..6402391.68 rows=19357 width=80) > > > > Interesting. Since there's no explicit sort in the plan, I infer that > > index tbl_mail_archive_pkey is on fld_mailid, meaning that the indexscan > > yields data already sorted by fld_mailid --- otherwise a sort step would > > be needed. Evidently the optimizer is guessing that "scan in fld_mailid > > order until you have 10 rows where fld_mail_list=0" is faster than > > "find all rows with fld_mail_list=0 and then sort by fld_mailid". > > > > Since you're complaining, I guess that this is not so :-( ... but I'm > > not sure how the optimizer might be taught to guess that. What exactly > > are the indexes *on* here; how many rows are in the table; and how many > > rows satisfy fld_mail_list=0? > > There is an index on fld_mail_list and there were 1093 rows that matched > out of about 4.1 million. > > I wonder if this is the same problem we had before where I need to order > by fld_mail_list, fld_mailid instead of just on fld_mailid. If so, you > need to get that fixed in the optimizer. > > db_geocrawler=# explain > db_geocrawler-# SELECT * FROM tbl_mail_archive WHERE > db_geocrawler-# fld_mail_list=0 ORDER BY fld_mail_list ASC,fld_mailid > ASC LIMIT 10 OFFSET 0; > NOTICE: QUERY PLAN: > > Sort (cost=78282.54..78282.54 rows=19357 width=80) > -> Index Scan using idx_archive_list on tbl_mail_archive > (cost=0.00..76904.24 rows=19357 width=80) > > EXPLAIN > > Notice how it is now using the right index, because I am doing a sort on > fld_mail_list first. -- Founder - PHPBuilder.com / Geocrawler.com Lead Developer - SourceForge VA Linux Systems 408-542-5723