Обсуждение: Why is this doing a seq scan?
I'm trying to find the nearest locations to a certain point using 2 tables. One contains the address and zipcodes of the locations and is about 2000 rows, the other contains zipcodes and lat, lon values and has about 1.4M rows. I've got indexes on the zip column of both tables and I just need to pull the lat and lon out of the zips table for each zipcode that happens to be in the ATMs table. There are only about 2000 rows in ATMs, and since both the atms.zip and zips.zip are indexed, I'm not sure why a seq scan is being performed. The select is taking anywhere from 30secs to 1min. it's running on a linux box w/2 pIII/700s and a raid ..so the machine shouldn't be slowing me down. I think it's the seq scan but I can't seem to get rid of it. => explain select ( point(32.85, -94.55) <@> point(y.lat, y.lon) ) as distance, x.zip, y.zip, y.lat, y.lon from atms x, zips y where x.zip = y.zip order by 1 limit 3; NOTICE: QUERY PLAN: Sort (cost=39164156.66..39164156.66 rows=32338349 width=40) -> Nested Loop (cost=0.00..30401394.25 rows=32338349 width=40) -> Seq Scan on zips y (cost=0.00..29558.49 rows=1401749 width=28) -> Index Scan using atms_zip onatms x (cost=0.00..21.38 rows=23 width=12) Any idea on how to speed this up? Thanks, Bryan
Hmm. Have you VACUUM ANALYZED the tables? If so, what do you get from these queries: select attname,attdisbursion,s.* from pg_statistic s, pg_attribute a, pg_class c where starelid = c.oid and attrelid = c.oid and staattnum = attnum and relname = 'zips'; select attname,attdisbursion,s.* from pg_statistic s, pg_attribute a, pg_class c where starelid = c.oid and attrelid = c.oid and staattnum = attnum and relname = 'atms'; Also it would be useful to see the full declarations of the tables and their indexes; I'm wondering what datatype the zip columns are, for example. regards, tom lane
> Hmm. Have you VACUUM ANALYZED the tables? If so, what do > you get from > these queries: Tom, thanks for the reply, and here is all the info you asked for. > select attname,attdisbursion,s.* > from pg_statistic s, pg_attribute a, pg_class c > where starelid = c.oid and attrelid = c.oid and staattnum = attnum > and relname = 'zips'; attname |attdisbursion|starelid|staattnum|staop|stanullfrac|stacommonfrac|stacommonv al|staloval |stahival ---------+-------------+--------+---------+-----+-----------+-------------+- -----------+----------+-------- zip | -1| 93920| 1| 1066| 0| 7.13394e-07|01226 |00401 |Y1A6A1 state | 0.165522| 93920| 2| 1066| 0| 0.346728|ON |AB |YT city | 0.00729095| 93920| 3| 1066| 0| 0.0322854|TORONTO | |ZWOLLE lat | 0.00326189| 93920| 4| 672| 0| 0.0153651|51.05 |-123.176 |79.989 lon | 0.00326061| 93920| 5| 672| 0| 0.0153594|-114.083333 |-176.31005|144.445 bestbound| 0.997491| 93920| 6| 672| 0.998605| 0.00107366|2 |2 |98 (6 rows) > select attname,attdisbursion,s.* > from pg_statistic s, pg_attribute a, pg_class c > where starelid = c.oid and attrelid = c.oid and staattnum = attnum > and relname = 'atms'; attname |attdisbursion|starelid|staattnum|staop|stanullfrac|stacommonfrac|stacommonv al |staloval |stahival ----------------+-------------+--------+---------+-----+-----------+-------- -----+---------------+------------------------------+---------------- terminal | -1| 50904| 1| 97| 0| 0.000433463|6000 |55 |9433 district | 0.0679035| 50904| 2| 664| 0| 0.192024|ARCO California|ARCO Arizona |Western New York name | 0.000261431| 50904| 3| 664| 0| 0.00130039|Gateway Center |11th & Conger |Zionsville address | 0.000261431| 50904| 4| 664| 0| 0.00130039|215 Tecumseh Rd|"402 E Yakima Ave, Suite 1400"|Windham Mall city | 0.00522279| 50904| 5| 664| 0| 0.0238405|Seattle |Aberdeen |Zionsville state | 0.0687854| 50904| 6| 1058| 0| 0.193758|CA |AK |WA zip | 0.000614214| 50904| 7| 664| 0| 0.00303424|92392 | |99901 access | 0.385091| 50904| 8| 664| 0| 0.579974|WU | |WU function | 0.396416| 50904| 9| 664| 0| 0.589944|FF |CD |FF location | 0.414461| 50904| 10| 664| 0| 0.605548|BR |BR |Rem language | 0.431861| 50904| 11| 664| 0| 0.620286|E | |E restricted_hours| 0.886758| 50904| 12| 664| 0| 0.939749|FALSE |FALSE |TRUE seasonal | 0.994812| 50904| 13| 664| 0| 0.997399|FALSE |FALSE |TRUE stamps | 0.621877| 50904| 14| 664| 0| 0.746857|FALSE |FALSE |TRUE (14 rows) > Also it would be useful to see the full declarations of the tables > and their indexes; I'm wondering what datatype the zip columns are, > for example. Table = atms +----------------------------------+----------------------------------+----- --+ | Field | Type | Length| +----------------------------------+----------------------------------+----- --+ | terminal | int4 | 4 | | district | text | var | | name | text | var | | address | text | var | | city | text | var | | state | char() | 2 | | zip | text | var | | access | text | var | | function | text | var | | location | text | var | | language | text | var | | restricted_hours | text | var | | seasonal | text | var | | stamps | text | var | +----------------------------------+----------------------------------+----- --+ Index: atms_zip thirdfed=> \d zips Table = zips +----------------------------------+----------------------------------+----- --+ | Field | Type | Length| +----------------------------------+----------------------------------+----- --+ | zip | varchar() | 10 | | state | varchar() | 3 | | city | varchar() | 100 | | lat | float8 | 8 | | lon | float8 | 8 | | bestbound | float8 | 8 | +----------------------------------+----------------------------------+----- --+ Indices: zips_latindex zips_lonindex zips_pkey thirdfed=> \d zips_pkey Table = zips_pkey +----------------------------------+----------------------------------+----- --+ | Field | Type | Length| +----------------------------------+----------------------------------+----- --+ | zip | varchar() | 10 | +----------------------------------+----------------------------------+----- --+ thirdfed=> \d atms_zip Table = atms_zip +----------------------------------+----------------------------------+----- --+ | Field | Type | Length| +----------------------------------+----------------------------------+----- --+ | zip | text | var | +----------------------------------+----------------------------------+----- --+ Thanks, Bryan
"Ingram, Bryan" <BIngram@sixtyfootspider.com> writes: >> Also it would be useful to see the full declarations of the tables >> and their indexes; I'm wondering what datatype the zip columns are, >> for example. > Table = atms > | zip | text | > Table = zips > | zip | varchar() | Ah, there's your problem --- the planner is not very smart about optimizing cross-datatype comparisons. Make these columns both text, or both varchar, and I'll bet you get a more intelligent plan. Current sources (7.1-to-be) are a little smarter than 7.0 about cross-data-type joins, but they still don't get this case right. I have a TODO item about that, but I dunno if it'll get done before 7.1 ... regards, tom lane
I said: > Ah, there's your problem --- the planner is not very smart about > optimizing cross-datatype comparisons. Make these columns both text, > or both varchar, and I'll bet you get a more intelligent plan. After a little further thought, I realize that the planner may be handicapped by not realizing it can do a merge or hash join across datatypes, but even without that problem, this is not going to be a fast query. What you've got is select ... from atms x, zips y where x.zip = y.ziporder by 1 limit 3; and there just isn't any way to process this without forming the full join product --- ie, the thing will sit there and form a join tuple for *every* valid combination of ATM and ZIP in your database, and then compute the distance to the target point for every one of those ATMs, and then sort that result, and finally give you only the top three rows. A smarter kind of join isn't going to help all that much; to make this fast, you need to be filtering using the really selective condition (distance to the target point) *before* you do the join. If you are sufficiently interested in the speed of this query to want to maintain a specialized index for it, I'd suggest looking at an r-tree index on the location data, and then using a WHERE condition on the r-tree index to prefilter the rows before you join. r-trees only work on boxes and polygons AFAICT --- what would work nicely is to store a "box" of very small dimensions surrounding the location of each ATM, index that column, and then use a WHERE test for overlap between that box column and a box surrounding the target point out to as far as you think is likely to be interesting. This gives you back a fairly small number of candidate ATMs for which you compute the exact distance to the target, sort, and limit. Not sure that you need to join to zips at all if you do it this way. regards, tom lane