Обсуждение: oddly slow query

Поиск
Список
Период
Сортировка

oddly slow query

От
Jessi Berkelhammer
Дата:
Hello.

I'm trying to figure out why a query I'm doing is incredibly slow (~10
minutes.) The incredibly slow query is something like:

SELECT count(*) from registration LEFT JOIN person USING (person_id)
WHERE x_program(registration.x_type_code) = 'blah';

The person view is quite big (~69000 rows). I don't actually want to
return the count, but I've been using that for testing purposes as it
has a similar response time to returning the fields needed.

Queries related to this, like:

A) SELECT count(*) from registration LEFT JOIN person USING (person_id);
and
B) SELECT count(*) from registration WHERE
x_program(registration.x_type_code) = 'blah';

are nearly instantaneous. I've run EXPLAIN, EXPLAIN VERBOSE, and EXPLAIN
ANALYZE on the query, and that hasn't helped me.

According to EXPLAIN ANALYZE, the slow query takes 709704 ms to execute,
and query A takes 1554 ms, but looking at the execution plans, query A
is predicted to take longer than the slow query.

The hold-up seems to be in a 'Nested Loop Left Join', which is only in
the plan for the slow query.
Here are the first two lines of EXPLAIN ANALYZE on the slow query:

Aggregate  (cost=8969.16..8969.17 rows=1 width=0) (actual
time=709703.985..709703.987 rows=1 loops=1)
    ->  Nested Loop Left Join  (cost=755.61..8968.29 rows=346 width=0)
(actual time=147.667..709700.553 rows=684 loops=1)

Does anybody have ideas why this is so slow?

Thank you,
Jessi

--
Jessi Berkelhammer
Downtown Emergency Service Center
Computer Programming Specialist

Re: oddly slow query

От
Tom Lane
Дата:
Jessi Berkelhammer <jberkelhammer@desc.org> writes:
> The hold-up seems to be in a 'Nested Loop Left Join', which is only in
> the plan for the slow query.
> Here are the first two lines of EXPLAIN ANALYZE on the slow query:

So you've left out all the information that would let anyone guess what
the problem is ...

Let's see the *whole* EXPLAIN ANALYZE output for all three of the
queries you mentioned.  It would also be appropriate to mention
exactly which PG version you're using.

            regards, tom lane

Re: oddly slow query

От
"Pavel Stehule"
Дата:
On 11/01/2008, Jessi Berkelhammer <jberkelhammer@desc.org> wrote:
> Hello.
>
> I'm trying to figure out why a query I'm doing is incredibly slow (~10
> minutes.) The incredibly slow query is something like:
>
> SELECT count(*) from registration LEFT JOIN person USING (person_id)
> WHERE x_program(registration.x_type_code) = 'blah';

Hello

what do you do in x_program function? Are you sure so it is fast?

Regards
Pavel Stehule

Re: oddly slow query

От
Jessi Berkelhammer
Дата:
Hello.

Thanks for the replies.

Pavel Stehule wrote:
 > what do you do in x_program function? Are you sure so it is fast?
 >
I do not think the function is the problem, as running the slow query
without it is just as slow, and similar queries using that function are
quick.

Tom Lane wrote:
>
> Let's see the *whole* EXPLAIN ANALYZE output for all three of the
> queries you mentioned.
Sorry for not including more.
Here are the 3 EXPLAIN ANALYZE commands followed by the output:


explain analyze select count(*) from clinical_reg_current LEFT JOIN
client using (client_id) WHERE
tier_program(clinical_reg_current.benefit_type_code) = 'SAGE';


                                          QUERY PLAN




----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=8969.16..8969.17 rows=1 width=0) (actual
time=738140.818..738140.820 rows=1 loops=1)
    ->  Nested Loop Left Join  (cost=755.61..8968.29 rows=346 width=0)
(actual time=150.918..738137.244 rows=684 loops=1)
          Join Filter: (clinical_reg_current.client_id = client.client_id)
          ->  Subquery Scan clinical_reg_current  (cost=754.36..758.23
rows=1 width=4) (actual time=57.359..146.717 rows=684 loops=1)
                Filter: (tier_program(benefit_type_code) = 'SAGE'::text)
                ->  Unique  (cost=754.36..756.47 rows=117 width=211)
(actual time=56.427..67.998 rows=1000 loops=1)
                      ->  Sort  (cost=754.36..755.42 rows=422 width=211)
(actual time=56.419..60.020 rows=1003 loops=1)
                            Sort Key: tbl_clinical_reg.client_id,
tbl_clinical_reg.clinical_reg_date
                            ->  Seq Scan on tbl_clinical_reg
(cost=0.00..735.96 rows=422 width=211) (actual time=7.447..52.914
rows=1003 loops=1)
                                  Filter: ((NOT is_deleted) AND
(clinical_reg_date <= ('now'::text)::date) AND ((clinical_reg_date_end
 >= ('now'::text)::date) OR (clinical_reg_date_end IS NULL)) AND
((kc_authorization_status_code)::text <> ALL (('{CX,TM}'::character
varying[])::text[])) AND ((benefit_type_code)::text <> ALL
(('{75,98,99,00}'::character varying[])::text[])))
          ->  Hash Left Join  (cost=1.25..6653.69 rows=69172 width=632)
(actual time=0.025..925.160 rows=69179 loops=684)
                Hash Cond: (tbl_client.client_id = prot.client_id)
                ->  Seq Scan on tbl_client  (cost=0.00..3453.20
rows=69172 width=427) (actual time=0.012..221.612 rows=69179 loops=684)
                      Filter: (NOT is_deleted)
                ->  Hash  (cost=1.20..1.20 rows=4 width=209) (actual
time=0.172..0.172 rows=4 loops=1)
                      ->  Subquery Scan prot  (cost=1.14..1.20 rows=4
width=209) (actual time=0.111..0.153 rows=4 loops=1)
                            ->  Unique  (cost=1.14..1.16 rows=4
width=162) (actual time=0.103..0.128 rows=4 loops=1)
                                  ->  Sort  (cost=1.14..1.15 rows=4
width=162) (actual time=0.099..0.106 rows=4 loops=1)
                                        Sort Key:
tbl_client_protected.client_id, tbl_client_protected.client_protected_date
                                        ->  Seq Scan on
tbl_client_protected  (cost=0.00..1.10 rows=4 width=162) (actual
time=0.039..0.061 rows=4 loops=1)
                                              Filter: ((NOT is_deleted)
AND (client_protected_date <= ('now'::text)::date) AND
((client_protected_date_end > ('now'::text)::date) OR
(client_protected_date_end IS NULL)))
  Total runtime: 738142.119 ms
(22 rows)

---------------------------------------------------------------
explain analyze select count(*) from clinical_reg_current LEFT JOIN
client using (client_id);


                                       QUERY PLAN




----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=10485.44..10485.45 rows=1 width=0) (actual
time=1497.870..1497.872 rows=1 loops=1)
    ->  Hash Left Join  (cost=8964.43..10384.27 rows=40466 width=0)
(actual time=1482.614..1495.678 rows=1000 loops=1)
          Hash Cond: (clinical_reg_current.client_id = client.client_id)
          ->  Unique  (cost=754.36..756.47 rows=117 width=211) (actual
time=59.971..66.819 rows=1000 loops=1)
                ->  Sort  (cost=754.36..755.42 rows=422 width=211)
(actual time=59.963..62.183 rows=1003 loops=1)
                      Sort Key: tbl_clinical_reg.client_id,
tbl_clinical_reg.clinical_reg_date
                      ->  Seq Scan on tbl_clinical_reg
(cost=0.00..735.96 rows=422 width=211) (actual time=6.604..56.440
rows=1003 loops=1)
                            Filter: ((NOT is_deleted) AND
(clinical_reg_date <= ('now'::text)::date) AND ((clinical_reg_date_end
 >= ('now'::text)::date) OR (clinical_reg_date_end IS NULL)) AND
((kc_authorization_status_code)::text <> ALL (('{CX,TM}'::character
varying[])::text[])) AND ((benefit_type_code)::text <> ALL
(('{75,98,99,00}'::character varying[])::text[])))
          ->  Hash  (cost=7345.41..7345.41 rows=69172 width=4) (actual
time=1422.563..1422.563 rows=69179 loops=1)
                ->  Subquery Scan client  (cost=1.25..7345.41 rows=69172
width=4) (actual time=0.282..1240.906 rows=69179 loops=1)
                      ->  Hash Left Join  (cost=1.25..6653.69 rows=69172
width=632) (actual time=0.275..940.975 rows=69179 loops=1)
                            Hash Cond: (tbl_client.client_id =
prot.client_id)
                            ->  Seq Scan on tbl_client
(cost=0.00..3453.20 rows=69172 width=427) (actual time=0.019..227.016
rows=69179 loops=1)
                                  Filter: (NOT is_deleted)
                            ->  Hash  (cost=1.20..1.20 rows=4 width=209)
(actual time=0.179..0.179 rows=4 loops=1)
                                  ->  Subquery Scan prot
(cost=1.14..1.20 rows=4 width=209) (actual time=0.100..0.152 rows=4 loops=1)
                                        ->  Unique  (cost=1.14..1.16
rows=4 width=162) (actual time=0.089..0.119 rows=4 loops=1)
                                              ->  Sort  (cost=1.14..1.15
rows=4 width=162) (actual time=0.085..0.093 rows=4 loops=1)
                                                    Sort Key:
tbl_client_protected.client_id, tbl_client_protected.client_protected_date
                                                    ->  Seq Scan on
tbl_client_protected  (cost=0.00..1.10 rows=4 width=162) (actual
time=0.031..0.058 rows=4 loops=1)
                                                          Filter: ((NOT
is_deleted) AND (client_protected_date <= ('now'::text)::date) AND
((client_protected_date_end > ('now'::text)::date) OR
(client_protected_date_end IS NULL)))
  Total runtime: 1498.442 ms
(22 rows)

---------------------------------------------------

explain analyze select count(*) from clinical_reg_current WHERE
tier_program(clinical_reg_current.benefit_type_code) = 'SAGE'


                                       QUERY PLAN




----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=758.23..758.24 rows=1 width=0) (actual
time=90.335..90.337 rows=1 loops=1)
    ->  Subquery Scan clinical_reg_current  (cost=754.36..758.23 rows=1
width=0) (actual time=51.892..88.528 rows=684 loops=1)
          Filter: (tier_program(benefit_type_code) = 'SAGE'::text)
          ->  Unique  (cost=754.36..756.47 rows=117 width=211) (actual
time=51.355..58.708 rows=1000 loops=1)
                ->  Sort  (cost=754.36..755.42 rows=422 width=211)
(actual time=51.349..53.521 rows=1003 loops=1)
                      Sort Key: tbl_clinical_reg.client_id,
tbl_clinical_reg.clinical_reg_date
                      ->  Seq Scan on tbl_clinical_reg
(cost=0.00..735.96 rows=422 width=211) (actual time=6.361..48.347
rows=1003 loops=1)
                            Filter: ((NOT is_deleted) AND
(clinical_reg_date <= ('now'::text)::date) AND ((clinical_reg_date_end
 >= ('now'::text)::date) OR (clinical_reg_date_end IS NULL)) AND
((kc_authorization_status_code)::text <> ALL (('{CX,TM}'::character
varying[])::text[])) AND ((benefit_type_code)::text <> ALL
(('{75,98,99,00}'::character varying[])::text[])))
  Total runtime: 90.452 ms
(9 rows)



> It would also be appropriate to mention
> exactly which PG version you're using.
>
Version is 8.2.5.

Thanks,
jessi
--
Jessi Berkelhammer
Downtown Emergency Service Center
Computer Programming Specialist

Re: oddly slow query

От
Tom Lane
Дата:
Jessi Berkelhammer <jberkelhammer@desc.org> writes:
> Here are the 3 EXPLAIN ANALYZE commands followed by the output:

Well, here's the problem:

>           Join Filter: (clinical_reg_current.client_id = client.client_id)
>           ->  Subquery Scan clinical_reg_current  (cost=754.36..758.23
> rows=1 width=4) (actual time=57.359..146.717 rows=684 loops=1)
>                 Filter: (tier_program(benefit_type_code) = 'SAGE'::text)
>                 ->  Unique  (cost=754.36..756.47 rows=117 width=211)
> (actual time=56.427..67.998 rows=1000 loops=1)

For some reason it's estimating only one row out of the
clinical_reg_current view will satisfy the
tier_program(benefit_type_code) = 'SAGE' constraint.  This causes it to
think a nestloop join to the client view would be a good idea.  The same
estimation error is present in your example with the function and no
join, but it doesn't hurt anything because there are no planning
decisions that depend on the estimate in that case.

The estimate of the view's rowcount without the filter isn't that great
either (117 vs 1000 actual) but it's not wrong enough to prompt selection
of a bad plan choice.  There's something funny going on with the
estimation of the function's selectivity --- does the expression
"tier_program(benefit_type_code)" match an index, perhaps?  If so, have
you updated stats for that table lately?

I'm also wondering why the function call isn't getting pushed down
further into the plan --- what's the definition of that view look like?

            regards, tom lane

Re: oddly slow query

От
Jessi Berkelhammer
Дата:
Hello.

Thanks for the help.

Tom Lane wrote:
> Jessi Berkelhammer <jberkelhammer@desc.org> writes:
>> Here are the 3 EXPLAIN ANALYZE commands followed by the output:
>
> Well, here's the problem:
>
>>           Join Filter: (clinical_reg_current.client_id = client.client_id)
>>           ->  Subquery Scan clinical_reg_current  (cost=754.36..758.23
>> rows=1 width=4) (actual time=57.359..146.717 rows=684 loops=1)
>>                 Filter: (tier_program(benefit_type_code) = 'SAGE'::text)
>>                 ->  Unique  (cost=754.36..756.47 rows=117 width=211)
>> (actual time=56.427..67.998 rows=1000 loops=1)
>
> For some reason it's estimating only one row out of the
> clinical_reg_current view will satisfy the
> tier_program(benefit_type_code) = 'SAGE' constraint.  This causes it to
> think a nestloop join to the client view would be a good idea.  The same
> estimation error is present in your example with the function and no
> join, but it doesn't hurt anything because there are no planning
> decisions that depend on the estimate in that case.
>
> The estimate of the view's rowcount without the filter isn't that great
> either (117 vs 1000 actual) but it's not wrong enough to prompt selection
> of a bad plan choice.  There's something funny going on with the
> estimation of the function's selectivity --- does the expression
> "tier_program(benefit_type_code)" match an index, perhaps?  If so, have
> you updated stats for that table lately?
>
Our database is analyzed & vacuumed nightly.

> I'm also wondering why the function call isn't getting pushed down
> further into the plan --- what's the definition of that view look like?
>
Here is the definition of the function:

  Schema |     Name     | Result data type | Argument data types |
Owner  | Language |
Source code                                                | Description

--------+--------------+------------------+---------------------+---------+----------+-----------------------------------------------------------------------------------------------------------+-------------
  public | tier_program | text             | character varying   |
Chasers | sql      |
                                                        |

               :


               :      SELECT COALESCE(clinical_project_code,description)
FROM l_benefit_type WHERE benefit_type_code = $1;

               :


               :


Here is the definition of clinical_reg_current view:

                            View "public.clinical_reg_current"
             Column            |              Type              |
Modifiers | Description
------------------------------+--------------------------------+-----------+-------------
  clinical_reg_id              | integer                        |
    |
  client_id                    | integer                        |
     |
  clinical_reg_date            | date                           |
     |
  benefit_type_code            | character varying(10)          |
     |
  funding_source_code          | character varying(10)          |
     |
  clinical_reg_date_end        | date                           |
     |
  clinical_exit_reason_code    | character varying(10)          |
     |
  kc_authorization_id          | integer                        |
     |
  kc_authorization_status_code | character varying(10)          |
     |
  current_case_rate            | numeric(8,2)                   |
     |
  case_rate_reason_code        | character varying(10)          |
     |
  kc_exit_type_code            | character varying(10)          |
     |
  added_by                     | integer                        |
     |
  added_at                     | timestamp(0) without time zone |
     |
  changed_by                   | integer                        |
     |
  changed_at                   | timestamp(0) without time zone |
     |
  is_deleted                   | boolean                        |
     |
  deleted_at                   | timestamp(0) without time zone |
     |
  deleted_by                   | integer                        |
     |
  deleted_comment              | text                           |
     |
  sys_log                      | text                           |
     |
View definition:
  SELECT DISTINCT ON (clinical_reg.client_id)
clinical_reg.clinical_reg_id, clinical_reg.client_id,
clinical_reg.clinical_reg_date, clinical_reg.benefit_type_code,
clinical_reg.funding_source_code, clinical_reg.clinical_reg_date_end,
clinical_reg.clinical_exit_reason_code,
clinical_reg.kc_authorization_id,
clinical_reg.kc_authorization_status_code,
clinical_reg.current_case_rate, clinical_reg.case_rate_reason_code,
clinical_reg.kc_exit_type_code, clinical_reg.added_by,
clinical_reg.added_at, clinical_reg.changed_by, clinical_reg.changed_at,
clinical_reg.is_deleted, clinical_reg.deleted_at,
clinical_reg.deleted_by, clinical_reg.deleted_comment, clinical_reg.sys_log
    FROM clinical_reg
   WHERE clinical_reg.clinical_reg_date <= 'now'::text::date AND
(clinical_reg.clinical_reg_date_end >= 'now'::text::date OR
clinical_reg.clinical_reg_date_end IS NULL) AND
(clinical_reg.kc_authorization_status_code::text <> ALL
(ARRAY['CX'::character varying, 'TM'::character varying]::text[])) AND
(clinical_reg.benefit_type_code::text <> ALL (ARRAY['75'::character
varying, '98'::character varying, '99'::character varying,
'00'::character varying]::text[]))
   ORDER BY clinical_reg.client_id, clinical_reg.clinical_reg_date DESC;

The clinical_reg view is everything from tbl_clinical_reg, where
is_deleted is false.

tbl_clinical_reg, the underlying table has the same columns as
clinical_reg_current and clinical_reg. The other information from its
definition is here:

Indexes:
     "tbl_clinical_reg_pkey" PRIMARY KEY, btree (clinical_reg_id)
     "tbl_clinical_reg_kc_authorization_id_key" UNIQUE, btree
(kc_authorization_id)
     "index_tbl_clinical_reg_benefit_type_code" btree
(benefit_type_code) WHERE NOT is_deleted
     "index_tbl_clinical_reg_benefit_type_status" btree
(benefit_type_code, kc_authorization_status_code) WHERE NOT is_deleted
     "index_tbl_clinical_reg_client_id" btree (client_id) WHERE NOT
is_deleted
     "index_tbl_clinical_reg_client_id_clinical_reg_date" btree
(client_id, clinical_reg_date) WHERE NOT is_deleted
     "index_tbl_clinical_reg_client_id_dates" btree (client_id,
clinical_reg_date, clinical_reg_date_end) WHERE NOT is_deleted
     "index_tbl_clinical_reg_clinical_reg_date" btree
(clinical_reg_date) WHERE NOT is_deleted
     "index_tbl_clinical_reg_clinical_reg_dates" btree
(clinical_reg_date, clinical_reg_date_end) WHERE NOT is_deleted
     "index_tbl_clinical_reg_dates_client_id" btree
(clinical_reg_date_end, clinical_reg_date, client_id) WHERE NOT is_deleted
     "index_tbl_clinical_reg_dates_client_id_deleted" btree
(clinical_reg_date_end, clinical_reg_date, client_id) WHERE NOT is_deleted
     "index_tbl_clinical_reg_status_benefit_type" btree
(kc_authorization_status_code, benefit_type_code) WHERE NOT is_deleted
     "index_tbl_clinical_reg_status_dates" btree
(kc_authorization_status_code, clinical_reg_date, clinical_reg_date_end)
WHERE NOT is_deleted
     "index_tbl_clinical_reg_status_dates_client_id_deleted" btree
(kc_authorization_status_code, clinical_reg_date_end, clinical_reg_date,
client_id) WHERE NOT is_deleted
Check constraints:
     "na_only_for_parke" CHECK (benefit_type_code::text <> 'NA'::text OR
funding_source_code::text = 'PARKE'::text)
     "tbl_clinical_reg_check" CHECK (NOT is_deleted AND deleted_at IS
NULL OR is_deleted AND deleted_at IS NOT NULL)
     "tbl_clinical_reg_check1" CHECK (NOT is_deleted AND deleted_by IS
NULL OR is_deleted AND deleted_by IS NOT NULL)
     "tbl_clinical_reg_current_case_rate_check" CHECK (current_case_rate
 >= 0::numeric)
Foreign-key constraints:
     "tbl_clinical_reg_added_by_fkey" FOREIGN KEY (added_by) REFERENCES
tbl_staff(staff_id)
     "tbl_clinical_reg_benefit_type_code_fkey" FOREIGN KEY
(benefit_type_code) REFERENCES l_benefit_type(benefit_type_code)
     "tbl_clinical_reg_case_rate_reason_code_fkey" FOREIGN KEY
(case_rate_reason_code) REFERENCES
l_kc_case_rate_reason(kc_case_rate_reason_code)
     "tbl_clinical_reg_changed_by_fkey" FOREIGN KEY (changed_by)
REFERENCES tbl_staff(staff_id)
     "tbl_clinical_reg_client_id_fkey" FOREIGN KEY (client_id)
REFERENCES tbl_client(client_id)
     "tbl_clinical_reg_clinical_exit_reason_code_fkey" FOREIGN KEY
(clinical_exit_reason_code) REFERENCES
l_clinical_exit_reason(clinical_exit_reason_code)
     "tbl_clinical_reg_deleted_by_fkey" FOREIGN KEY (deleted_by)
REFERENCES tbl_staff(staff_id)
     "tbl_clinical_reg_funding_source_code_fkey" FOREIGN KEY
(funding_source_code) REFERENCES l_funding_source(funding_source_code)
     "tbl_clinical_reg_kc_authorization_status_code_fkey" FOREIGN KEY
(kc_authorization_status_code) REFERENCES
l_kc_authorization_status(kc_authorization_status_code)
     "tbl_clinical_reg_kc_exit_type_code_fkey" FOREIGN KEY
(kc_exit_type_code) REFERENCES l_kc_exit_type(kc_exit_type_code)
Triggers:
     tbl_clinical_reg_changed_at_update BEFORE UPDATE ON
tbl_clinical_reg FOR EACH ROW EXECUTE PROCEDURE auto_changed_at_update()
     tbl_clinical_reg_log_chg AFTER INSERT OR DELETE OR UPDATE ON
tbl_clinical_reg FOR EACH ROW EXECUTE PROCEDURE table_log()
Has OIDs: no

Thank you.
take care,
jessi

--
Jessi Berkelhammer
Downtown Emergency Service Center
Computer Programming Specialist

Re: oddly slow query

От
Tom Lane
Дата:
Jessi Berkelhammer <jberkelhammer@desc.org> writes:
> Tom Lane wrote:
>> For some reason it's estimating only one row out of the
>> clinical_reg_current view will satisfy the
>> tier_program(benefit_type_code) = 'SAGE' constraint.

My math was off the other day --- actually, that's exactly what you'd
expect for the default estimate on an equality condition it has no stats
for, when the underlying scan is estimated to have only 117 rows.  So
either you should do something about getting that underlying estimate
up closer to reality (perhaps increasing default_statistics_target would
improve matters?), or you need to fix things so that the planner can
apply its statistics to estimating what is happening with the
tier_program constraint.  Expressed as a function this way, it's just a
black box to the planner so you get a default estimate.  Given that the
function is just extracting from a table, I think you could remove the
function call and express the condition with a join instead, and that
might result in a better estimate.

>> I'm also wondering why the function call isn't getting pushed down
>> further into the plan --- what's the definition of that view look like?

> View definition:
>   SELECT DISTINCT ON (clinical_reg.client_id)

Ah, it's the DISTINCT ON that's preventing any better optimization.
Not much to be done about that, unless you can recast things to not
need DISTINCT ON, which looks a bit hard.

            regards, tom lane