Обсуждение: earthdistance query performance

Поиск
Список
Период
Сортировка

earthdistance query performance

От
AI Rumman
Дата:
Hi,

I have a table with zip_code and latitude and longitude. 

\d zip_code_based_lng_lat
    Table "public.zip_code_based_lng_lat"
 Column |          Type          | Modifiers 
--------+------------------------+-----------
 zip    | character varying(100) | 
 state  | character varying(100) | 
 city   | character varying(100) | 
 type   | character varying(100) | 
 lat    | character varying(100) | 
 lng    | character varying(100) | 
Indexes:
    "zip_code_based_lng_lat_zipidx" btree (zip)

I need to find the closest distance using the radius formula using a zip_code provided by user.

I build the query like:

select *,
earth_distance(q2_c1, q1.c1) as d
from
(
select *, ll_to_earth(lat::float,lng::float)  as c1  from zip_code_based_lng_lat
) as q1,
(
select ll_to_earth(lat::float,lng::float) q2_c1 from zip_code_based_lng_lat where zip='18938'
) as q2
order by d
limit 10


 Limit  (cost=216010.21..216010.24 rows=10 width=55) (actual time=38296.185..38296.191 rows=10 loops=1)
   ->  Sort  (cost=216010.21..216415.74 rows=162212 width=55) (actual time=38296.182..38296.182 rows=10 loops=1)
         Sort Key: (sec_to_gc(cube_distance((ll_to_earth((public.zip_code_based_lng_lat.lat)::double precision, (public.zip_code_based_lng_lat.lng)::double precision))::cube, (ll
_to_earth((public.zip_code_based_lng_lat.lat)::double precision, (public.zip_code_based_lng_lat.lng)::double precision))::cube)))
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Nested Loop  (cost=0.00..212504.87 rows=162212 width=55) (actual time=3.244..38052.444 rows=81106 loops=1)
               ->  Seq Scan on zip_code_based_lng_lat  (cost=0.00..817.90 rows=81106 width=38) (actual time=0.025..50.669 rows=81106 loops=1)
               ->  Materialize  (cost=0.00..0.32 rows=2 width=17) (actual time=0.000..0.001 rows=1 loops=81106)
                     ->  Index Scan using zip_code_based_lng_lat_zipidx on zip_code_based_lng_lat  (cost=0.00..0.31 rows=2 width=17) (actual time=0.080..0.084 rows=1 loops=1)
                           Index Cond: ((zip)::text = '18938'::text)
 Total runtime: 38296.360 ms


The result is fine. But it is too slow. 
I am using Postgresql 9.2 with following parameters:

shared_buffers = 6GB
work_mem = 500 MB
seq_page_cost = 0.01
random_page_cost = 0.01

Any idea to improve it.

Thanks.


Re: earthdistance query performance

От
Merlin Moncure
Дата:
On Wed, Sep 25, 2013 at 10:05 AM, AI Rumman <rummandba@gmail.com> wrote:
> Hi,
>
> I have a table with zip_code and latitude and longitude.
>
> \d zip_code_based_lng_lat
>     Table "public.zip_code_based_lng_lat"
>  Column |          Type          | Modifiers
> --------+------------------------+-----------
>  zip    | character varying(100) |
>  state  | character varying(100) |
>  city   | character varying(100) |
>  type   | character varying(100) |
>  lat    | character varying(100) |
>  lng    | character varying(100) |
> Indexes:
>     "zip_code_based_lng_lat_zipidx" btree (zip)
>
> I need to find the closest distance using the radius formula using a
> zip_code provided by user.
>
> I build the query like:
>
> select *,
> earth_distance(q2_c1, q1.c1) as d
> from
> (
> select *, ll_to_earth(lat::float,lng::float)  as c1  from
> zip_code_based_lng_lat
> ) as q1,
> (
> select ll_to_earth(lat::float,lng::float) q2_c1 from zip_code_based_lng_lat
> where zip='18938'
> ) as q2
> order by d
> limit 10
>
>
>  Limit  (cost=216010.21..216010.24 rows=10 width=55) (actual
> time=38296.185..38296.191 rows=10 loops=1)
>    ->  Sort  (cost=216010.21..216415.74 rows=162212 width=55) (actual
> time=38296.182..38296.182 rows=10 loops=1)
>          Sort Key:
> (sec_to_gc(cube_distance((ll_to_earth((public.zip_code_based_lng_lat.lat)::double
> precision, (public.zip_code_based_lng_lat.lng)::double precision))::cube,
> (ll
> _to_earth((public.zip_code_based_lng_lat.lat)::double precision,
> (public.zip_code_based_lng_lat.lng)::double precision))::cube)))
>          Sort Method: top-N heapsort  Memory: 27kB
>          ->  Nested Loop  (cost=0.00..212504.87 rows=162212 width=55)
> (actual time=3.244..38052.444 rows=81106 loops=1)
>                ->  Seq Scan on zip_code_based_lng_lat  (cost=0.00..817.90
> rows=81106 width=38) (actual time=0.025..50.669 rows=81106 loops=1)
>                ->  Materialize  (cost=0.00..0.32 rows=2 width=17) (actual
> time=0.000..0.001 rows=1 loops=81106)
>                      ->  Index Scan using zip_code_based_lng_lat_zipidx on
> zip_code_based_lng_lat  (cost=0.00..0.31 rows=2 width=17) (actual
> time=0.080..0.084 rows=1 loops=1)
>                            Index Cond: ((zip)::text = '18938'::text)
>  Total runtime: 38296.360 ms

your problem is the sort is happening before the limit.  you need to
reconfigure your query so that's compatible with nearest neighbor
search (which was introduced with 9.1).

merlin