Performance of a view

Поиск
Список
Период
Сортировка
От John McCawley
Тема Performance of a view
Дата
Msg-id 437B6974.5040803@hardgeus.com
обсуждение исходный текст
Ответы Re: Performance of a view  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
Hello all,

I have a fairly complex query whose performance problem I have isolated 
to a fairly small subset.  The pertinent parts of the table structure 
are as follows:

//-------------------------

tbl_claim
claim_id integer SERIAL PRIMARY KEY;
claimnum varchar(32);

//-------------------------

tbl_invoice
invoice_id integer SERIAL PRIMARY KEY;
claim_id integer integer;
invoicedate timestamp;
active integer;

//-------------------------

there is an index on claimnum, and the claim_id in tbl_invoice is a 
foreign key which references tbl_claim

I have a view which is defined as follows:

//-------------------------
SELECT tbl_claim.claim_id, count(tbl_invoice.invoice_id) AS count, 
min(tbl_invoice.invoicedate) AS invoicedate FROM tbl_claim LEFT JOIN tbl_invoice ON tbl_claim.claim_id =
tbl_invoice.claim_idAND 
 
tbl_invoice.active = 1GROUP BY tbl_claim.claim_id;
//-------------------------


If I runn the following:

EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim WHERE claimnum 
= 'L1J8823';

I get:

Index Scan using idx_claim_claimnum on tbl_claim  (cost=0.00..10.01 
rows=2 width=4) (actual time=0.079..0.088 rows=2 loops=1) Index Cond: ((claimnum)::text = 'L1J8823'::text)
Total runtime: 0.123 ms


If I run:

EXPLAIN ANALYZE SELECT
tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON 
tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE
tbl_claim.claim_id = 217778;

I get:
Nested Loop  (cost=17.21..25.50 rows=4 width=4) (actual 
time=0.069..0.076 rows=1 loops=1) ->  Index Scan using tbl_claim_pkey on tbl_claim  (cost=0.00..8.21 
rows=2 width=4) (actual time=0.020..0.021 rows=1 loops=1)       Index Cond: (claim_id = 217778) ->  Materialize
(cost=17.21..17.23rows=2 width=4) (actual 
 
time=0.044..0.047 rows=1 loops=1)       ->  Subquery Scan vw_claiminvoicecount  (cost=0.00..17.21 rows=2 
width=4) (actual time=0.041..0.043 rows=1 loops=1)             ->  GroupAggregate  (cost=0.00..17.19 rows=2 width=16) 
(actual time=0.039..0.040 rows=1 loops=1)                   ->  Nested Loop Left Join  (cost=0.00..17.17 rows=2 
width=16) (actual time=0.024..0.030 rows=1 loops=1)                         ->  Index Scan using tbl_claim_pkey on 
tbl_claim  (cost=0.00..8.21 rows=2 width=4) (actual time=0.005..0.007 
rows=1 loops=1)                               Index Cond: (217778 = claim_id)                         ->  Index Scan
usingidx_tbl_invoice_claim_id 
 
on tbl_invoice  (cost=0.00..4.39 rows=7 width=16) (actual 
time=0.014..0.018 rows=1 loops=1)                               Index Cond: ("outer".claim_id = 
tbl_invoice.claim_id)                               Filter: (active = 1)
Total runtime: 0.232 ms


However, if I run:

EXPLAIN ANALYZE SELECT
tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON 
tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE
tbl_claim.claimnum = 'L1J8823';


I get:

Merge Join  (cost=60015.93..69488.39 rows=3 width=4) (actual 
time=4605.711..4605.762 rows=2 loops=1) Merge Cond: ("outer".claim_id = "inner".claim_id) ->  Subquery Scan
vw_claiminvoicecount (cost=60005.91..68940.54 
 
rows=215119 width=4) (actual time=3074.520..4491.423 rows=157215 loops=1)       ->  GroupAggregate
(cost=60005.91..66789.35rows=215119 
 
width=16) (actual time=3074.515..4265.315 rows=157215 loops=1)             ->  Merge Left Join
(cost=60005.91..64100.37rows=215119 
 
width=16) (actual time=3074.493..3845.516 rows=162280 loops=1)                   Merge Cond: ("outer".claim_id =
"inner".claim_id)                  ->  Sort  (cost=29403.35..29941.15 rows=215119 
 
width=4) (actual time=1253.372..1392.089 rows=157216 loops=1)                         Sort Key: tbl_claim.claim_id
                  ->  Seq Scan on tbl_claim  (cost=0.00..7775.19 
 
rows=215119 width=4) (actual time=0.031..336.606 rows=215119 loops=1)                   ->  Sort
(cost=30602.56..31146.52rows=217582 
 
width=16) (actual time=1821.075..1967.639 rows=151988 loops=1)                         Sort Key: tbl_invoice.claim_id
                     ->  Seq Scan on tbl_invoice  
 
(cost=0.00..6967.61 rows=217582 width=16) (actual time=0.066..507.189 
rows=219530 loops=1)                               Filter: (active = 1) ->  Sort  (cost=10.02..10.03 rows=2 width=4)
(actualtime=0.144..0.145 
 
rows=2 loops=1)       Sort Key: tbl_claim.claim_id       ->  Index Scan using idx_claim_claimnum on tbl_claim  
(cost=0.00..10.01 rows=2 width=4) (actual time=0.120..0.127 rows=2 loops=1)             Index Cond: ((claimnum)::text =
'L1J8823'::text)
Total runtime: 4620.653 ms


I roughly understand what is happening...in the first query, the dataset 
is being knocked down to one row, then somehow the view is being 
constructed using only that subset of the claim table.  In the second 
query, the view is being constructed from the entire dataset which is 
hundreds of thousands of rows, and thus is much slower.

My question is how would I go about obtaining the behavior from the 
faster query in the slower query?  I have switched the order of the 
tables, and tried many different permutations of the query, but no 
matter what I do, it seems that unless I specifically hard-code a 
claim_id filter on the claim_id, I am forced to run through every record.

Thoughts?




В списке pgsql-sql по дате отправления:

Предыдущее
От: "Lane Van Ingen"
Дата:
Сообщение: Re: How to Log SELECT Statements Having Errors
Следующее
От: Emil Kaffeshop
Дата:
Сообщение: Arrya variable as argument to IN expression