Join with lower/upper limits doesn't scale well

Поиск
Список
Период
Сортировка
От Craig James
Тема Join with lower/upper limits doesn't scale well
Дата
Msg-id 46898AB3.7090409@emolecules.com
обсуждение исходный текст
Ответы Re: Join with lower/upper limits doesn't scale well  (Gregory Stark <stark@enterprisedb.com>)
Список pgsql-performance
I have the same schema in two different databases.  In "smalldb", the two tables of interest have about 430,000 rows,
in"bigdb", the two tables each contain about 5.5 million rows.  I'm processing the data, and for various reasons it
worksout well to process it in 100,000 row chunks.  However, it turns out for the big schema, selecting 100,000 rows is
thelongest single step of the processing. 

Below is the explain/analyze output of the query from each database.  Since both tables are indexed on the joined
columns,I don't understand why the big table should be so much slower -- I hoped this would scale well, or at least
O(log(N)),not O(N). 

What's going on here?  I don't know if I'm reading this right, but it looks like the sort is taking all the time, but
thatdoesn't make sense because in both cases it's sorting 100,000 rows. 

Thanks,
Craig


bigdb=> explain analyze
bigdb->   select r.row_num, m.molkeys from my_rownum r
bigdb->   join my_molkeys m on (r.version_id = m.version_id)
bigdb->   where r.row_num >= 100000 AND r.row_num < 200000
bigdb->   order by r.row_num;

 Sort  (cost=431000.85..431248.23 rows=98951 width=363) (actual time=46306.748..46417.448 rows=100000 loops=1)
   Sort Key: r.row_num
   ->  Hash Join  (cost=2583.59..422790.68 rows=98951 width=363) (actual time=469.010..45752.131 rows=100000 loops=1)
         Hash Cond: ("outer".version_id = "inner".version_id)
         ->  Seq Scan on my_molkeys m  (cost=0.00..323448.30 rows=5472530 width=363) (actual time=11.243..33299.933
rows=5472532loops=1) 
         ->  Hash  (cost=2336.21..2336.21 rows=98951 width=8) (actual time=442.260..442.260 rows=100000 loops=1)
               ->  Index Scan using i_chm_rownum_row_num on my_rownum r  (cost=0.00..2336.21 rows=98951 width=8)
(actualtime=47.551..278.736 rows=100000 loops=1) 
                     Index Cond: ((row_num >= 100000) AND (row_num < 200000))
 Total runtime: 46543.163 ms


smalldb=> explain analyze
smalldb->   select r.row_num, m.molkeys from my_rownum r
smalldb->   join my_molkeys m on (r.version_id = m.version_id)
smalldb->   where r.row_num >= 100000 AND r.row_num < 200000
smalldb->   order by r.row_num;

 Sort  (cost=43598.23..43853.38 rows=102059 width=295) (actual time=4097.180..4207.733 rows=100000 loops=1)
   Sort Key: r.row_num
   ->  Hash Join  (cost=2665.09..35107.41 rows=102059 width=295) (actual time=411.635..3629.756 rows=100000 loops=1)
         Hash Cond: ("outer".version_id = "inner".version_id)
         ->  Seq Scan on my_molkeys m  (cost=0.00..23378.90 rows=459590 width=295) (actual time=8.563..2011.455
rows=459590loops=1) 
         ->  Hash  (cost=2409.95..2409.95 rows=102059 width=8) (actual time=402.867..402.867 rows=100000 loops=1)
               ->  Index Scan using i_chm_rownum_row_num_8525 on my_rownum r  (cost=0.00..2409.95 rows=102059 width=8)
(actualtime=37.122..242.528 rows=100000 loops=1) 
                     Index Cond: ((row_num >= 100000) AND (row_num < 200000))
 Total runtime: 4333.501 ms



Table "bigdb.my_rownum"
   Column   |  Type   | Modifiers
------------+---------+-----------
 version_id | integer |
 parent_id  | integer |
 row_num    | integer |
Indexes:
    "i_chm_rownum_row_num" UNIQUE, btree (row_num)
    "i_chm_rownum_version_id" UNIQUE, btree (version_id)
    "i_chm_rownum_parent_id" btree (parent_id)



Table "bigdb.my_molkeys"
   Column   |  Type   | Modifiers
------------+---------+-----------
 version_id | integer |
 molkeys    | text    |
Indexes:
    "i_chm_molkeys_version_id" UNIQUE, btree (version_id)

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: PostgreSQL 8.0 occasionally slow down
Следующее
От: Gregory Stark
Дата:
Сообщение: Re: Join with lower/upper limits doesn't scale well