Re: Why is my view making my disk churn? (iostat)

Поиск
Список
Период
Сортировка
От David Johnston
Тема Re: Why is my view making my disk churn? (iostat)
Дата
Msg-id 4C3A5783-E44D-4A40-9918-94853DD95415@yahoo.com
обсуждение исходный текст
Ответ на Why is my view making my disk churn? (iostat)  (Wells Oliver <wellsoliver@gmail.com>)
Список pgsql-general
On Sep 11, 2012, at 22:41, Wells Oliver <wellsoliver@gmail.com> wrote:

> I have this view which combines a few tables. It's wide-ish, 60 rows, almost all columns integer. It joins five
tables,all using primary keys. The explain output is as follows: 
>
> Nested Loop Left Join  (cost=0.01..92.38 rows=4 width=222) (actual time=0.291..0.711 rows=4 loops=1)
>   Join Filter: ((mytable.pid = info.pid) AND (mytable.year = info.year))
>   ->  Nested Loop Left Join  (cost=0.01..81.93 rows=4 width=218) (actual time=0.110..0.221 rows=4 loops=1)
>         Join Filter: ((mytable.pid = pos.pid) AND (mytable.year = pos.year))
>         ->  Nested Loop Left Join  (cost=0.01..73.55 rows=4 width=214) (actual time=0.092..0.197 rows=4 loops=1)
>               Join Filter: ((mytable.year = ipf.year) AND (mytable.pid = ipf.pid) AND ((mytable.sport_code)::text =
(ipf.sport_code)::text))
>               ->  Nested Loop  (cost=0.01..63.44 rows=4 width=202) (actual time=0.063..0.143 rows=4 loops=1)
>                     ->  Index Scan using idx_persons_id on _persons player  (cost=0.00..8.28 rows=1 width=23) (actual
time=0.016..0.017rows=1 loops=1) 
>                           Index Cond: (mlbam_id = 461416)
>                     ->  Nested Loop Left Join  (cost=0.01..55.12 rows=4 width=183) (actual time=0.045..0.122 rows=4
loops=1)
>                           ->  Index Scan using mytable.pkey on mytable. (cost=0.00..21.92 rows=4 width=172) (actual
time=0.027..0.071rows=4 loops=1) 
>                                 Index Cond: ((year = 2012) AND .pid = 461416) AND (gtype = 'R'::bpchar) AND (split =
'all'::text))
>                           ->  Index Scan using teams_history_pkey on teams_history ts  (cost=0.01..8.28 rows=1
width=20)(actual time=0.007..0.007 rows=0 loops=4) 
>                                 Index Cond: ((team_id = mytable.team_id) AND (year = textin(int4out(mytable.year))))
>               ->  Materialize  (cost=0.00..10.03 rows=1 width=25) (actual time=0.006..0.009 rows=3 loops=4)
>                     ->  Index Scan using ipf_pkey on ipf ipf  (cost=0.00..10.03 rows=1 width=25) (actual
time=0.019..0.027rows=3 loops=1) 
>                           Index Cond: ((year = 2012) AND .pid = 461416) AND ((factor_type)::text = 'run'::text) AND
((player_type)::text= 'hitter'::text)) 
>         ->  Materialize  (cost=0.00..8.32 rows=1 width=12) (actual time=0.004..0.005 rows=1 loops=4)
>               ->  Index Scan using primary_pos_pkey1 on pos  (cost=0.00..8.31 rows=1 width=12) (actual
time=0.012..0.012rows=1 loops=1) 
>                     Index Cond: (.pid = 461416) AND (year = 2012))
>   ->  Materialize  (cost=0.00..8.30 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=4)
>         ->  Index Scan using info_pkey on info  (cost=0.00..8.30 rows=1 width=12) (actual time=0.011..0.013 rows=1
loops=1)
>               Index Cond: (.pid = 461416) AND (year = 2012))
> Total runtime: 1.350 ms
>
> When I run this query, the disk thrashes. It's on a 4-disk RAID5 array. I call it for some 6,000 values of pid, like:
SELECT* FROM myview WHERE pid = 1, then 2, etc. The iostat outputs shows util% getting close to 100% quickly: 
>
> cciss/c0d1        0.00     0.00  152.00    0.00  2784.00     0.00    18.32     0.97    6.38   6.18  94.00
> cciss/c0d1        0.00     0.00  135.00    0.00  2688.00     0.00    19.91     0.92    6.96   6.22  84.00
> cciss/c0d1        0.00     0.00  131.00    0.00  2928.00     0.00    22.35     0.75    5.80   5.42  71.00
>
> It hits 100% easily if other things are going on.
>
> Is there anything I can do here? I suppose I could put my DB on a RAID0 array and that might help, but it seems
drastic.This is not a high load environment and given that the view combines all of the tables via primary keys, I feel
likethis should be quicker. Is there some clue in the EXPLAIN output I am missing? 
>
> The throttling of the disk causes other processes to queue up.
>

Not sure on the IO piece but...the query itself seems to run fast (1.350 ms), it's the "running it 6000 times" that is
likelythe issue. 

You do not indicate how you are running the query/queries

You do not provide the query itself.

The plan seems sub-optimal (nested loop left join) but a plan without a query is difficult to comment on.  It only
returns4 rows so this probably doesn't matter...and as I said above the query itself is fast. 

In short the how you are running the query is the killer.  You could try putting all your ".pid" values into a table
(orarray but that would be a large array) and fire off the query once. 

Another option is to at least "PREPARE" the query so it is only planned once then that one plan can be executed 6000
times.

David J.

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bad pg_dump error message
Следующее
От: Edson Richter
Дата:
Сообщение: Re: Compressed binary field