Обсуждение: Query uses incorrect index

Поиск
Список
Период
Сортировка

Query uses incorrect index

От
pasman pasmański
Дата:
hello.

I ve the table NumeryA with 3 indices. Query below uses incorrect index.


SELECT
  A."NKA",
  A."NTA",
  Min("PołączeniaMin") || ',' || Max("PołączeniaMax") AS "Biling",
  Sum("Ile")::text AS "Ilość CDR",
  R."LP"::text AS "Sprawa",
  R."Osoba weryfikująca" AS "Osoba",
  to_char(min("Wartość"),'FM9999990D00') AS "Wartość po kontroli",
  max(R."Kontrola po naprawie w Serat - CDR")::text AS "CDR po kontroli",
  min(A."KodBłędu")::text AS KodBłędu,
  Max(to_char(R."Data kontroli",'YYYY-MM-DD')) AS "Ostatnia Kontrola"
, max("Skutek wprowadzenia błednej ewidencji w Serat") as "Skutek"
FROM
  ONLY "NumeryA" A
LEFT JOIN
  (select * from "Rejestr stacji do naprawy" where "Data weryfikacji"
>= current_date-3*30) R
ON
  A."NKA" = R."Numer kierunkowy"
  and substr(A."NTA",1,5) = substr(R."Numer stacji",1,5)
  and A."NTA" like R."Numer stacji"
  and A."KodBłędu" = R."Kod Błędu"
WHERE
  A."DataPliku" >= current_date-3*30
  and A."KodBłędu" similar to '74'
GROUP
  BY R."Osoba weryfikująca",R."LP",A."NKA", A."NTA"
ORDER
  BY Sum("Ile") DESC
LIMIT 4000

This query has plan:

----------------------------------------------------------------
Limit  (cost=9656.43..9666.43 rows=4000 width=96) (actual
time=2149.383..2174.363 rows=4000 loops=1)
  ->  Sort  (cost=9656.43..9716.86 rows=24175 width=96) (actual
time=2149.373..2158.355 rows=4000 loops=1)
        Sort Key: (sum(a.Ile"))"
        Sort Method:  top-N heapsort  Memory: 1028kB
        ->  HashAggregate  (cost=6711.21..8089.19 rows=24175 width=96)
(actual time=2040.721..2110.075 rows=9080 loops=1)
              ->  Merge Left Join  (cost=5338.65..5925.53 rows=24175
width=96) (actual time=1180.490..1717.727 rows=33597 loops=1)
                    Merge Cond: (((a.NKA")::text = ("Rejestr stacji do
naprawy"."Numer kierunkowy")::text) AND ((substr((a."NTA")::text, 1,
5)) = (substr(("Rejestr stacji do naprawy"."Numer stacji")::text, 1,
5))) AND ((a."KodBłędu")::text = ("Rejestr stacji do naprawy"."Kod
Błędu")::text))"
                    Join Filter: ((a.NTA")::text ~~ ("Rejestr stacji
do naprawy"."Numer stacji")::text)"
                    ->  Sort  (cost=3565.16..3625.60 rows=24175
width=42) (actual time=819.034..900.141 rows=33597 loops=1)
                          Sort Key: a.NKA", (substr((a."NTA")::text,
1, 5)), a."KodBłędu""
                          Sort Method:  quicksort  Memory: 5487kB
                          ->  Index Scan using dp_kb on NumeryA" a
(cost=0.01..1805.07 rows=24175 width=42) (actual time=0.295..197.627
rows=33597 loops=1)"
                                Index Cond: (DataPliku" >=
(('now'::text)::date - 90))"
                                Filter: ((KodBłędu")::text ~
'***:^(?:74)$'::text)"
                    ->  Sort  (cost=1773.49..1811.23 rows=15096
width=67) (actual time=361.430..434.675 rows=32948 loops=1)
                          Sort Key: Rejestr stacji do naprawy"."Numer
kierunkowy", (substr(("Rejestr stacji do naprawy"."Numer
stacji")::text, 1, 5)), "Rejestr stacji do naprawy"."Kod Błędu""
                          Sort Method:  quicksort  Memory: 2234kB
                          ->  Bitmap Heap Scan on Rejestr stacji do
naprawy"  (cost=141.75..725.68 rows=15096 width=67) (actual
time=2.604..51.567 rows=14893 loops=1)"
                                Recheck Cond: (Data weryfikacji" >=
(('now'::text)::date - 90))"
                                ->  Bitmap Index Scan on Data
weryfikacji_Kod Błędu"  (cost=0.00..137.98 rows=15096 width=0) (actual
time=2.463..2.463 rows=15462 loops=1)"
                                      Index Cond: (Data weryfikacji"
>= (('now'::text)::date - 90))"
Total runtime: 2186.011 ms


When i delete index dp_kb, query runs faster:

-------------------------------------------------------------------------
Limit  (cost=15221.69..15231.69 rows=4000 width=96) (actual
time=1296.896..1322.144 rows=4000 loops=1)
  ->  Sort  (cost=15221.69..15282.13 rows=24175 width=96) (actual
time=1296.887..1305.993 rows=4000 loops=1)
        Sort Key: (sum(a.Ile"))"
        Sort Method:  top-N heapsort  Memory: 1028kB
        ->  HashAggregate  (cost=12276.48..13654.45 rows=24175
width=96) (actual time=1188.706..1257.669 rows=9080 loops=1)
              ->  Merge Left Join  (cost=0.01..11490.79 rows=24175
width=96) (actual time=0.220..840.102 rows=33597 loops=1)
                    Merge Cond: (((a.NKA")::text = ("Rejestr stacji do
naprawy"."Numer kierunkowy")::text) AND (substr((a."NTA")::text, 1, 5)
= substr(("Rejestr stacji do naprawy"."Numer stacji")::text, 1, 5))
AND ((a."KodBłędu")::text = ("Rejestr stacji do naprawy"."Kod
Błędu")::text))"
                    Join Filter: ((a.NTA")::text ~~ ("Rejestr stacji
do naprawy"."Numer stacji")::text)"
                    ->  Index Scan using NTA_5" on "NumeryA" a
(cost=0.01..10016.75 rows=24175 width=42) (actual time=0.132..308.018
rows=33597 loops=1)"
                          Index Cond: (((KodBłędu")::text =
'74'::text) AND ("DataPliku" >= (('now'::text)::date - 90)))"
                          Filter: ((KodBłędu")::text ~ '***:^(?:74)$'::text)"
                    ->  Index Scan using 3" on "Rejestr stacji do
naprawy"  (cost=0.01..1002.73 rows=15096 width=67) (actual
time=0.047..129.840 rows=32948 loops=1)"
                          Index Cond: (Rejestr stacji do
naprawy"."Data weryfikacji" >= (('now'::text)::date - 90))"
Total runtime: 1333.347 ms


How to tune settings to use good index ?
Include definitions of indexes:

CREATE TABLE "NumeryA"
(
  "Plik" character varying(254) NOT NULL,
  "DataPliku" date,
  "KodBłędu" character varying(254) NOT NULL,
  "NKA" character varying(254) NOT NULL,
  "NTA" character varying(254) NOT NULL,
  "Ile" integer,
  "PołączeniaMin" character varying,
  "PołączeniaMax" character varying,
  "Wycofane" "char",
  "Data" character varying[],
  "ID Kobat" character varying[],
  "NRB" character varying[],
  "LP" integer,
  CONSTRAINT "NumeryA_1_pkey" PRIMARY KEY ("NTA", "NKA", "KodBłędu", "Plik")
)
WITH (
  OIDS=FALSE
);


CREATE INDEX "NTA_5"
  ON "NumeryA"
  USING btree
  ("NKA", substr("NTA"::text, 1, 5), "KodBłędu", "DataPliku");

CREATE INDEX dp_kb
  ON "NumeryA"
  USING btree
  ("DataPliku");

CREATE INDEX nka_nta
  ON "NumeryA"
  USING btree
  ("NKA", "NTA");


Here my planner settings:

-----------------------------------------------------------
# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

seq_page_cost = 0.3            # measured on an arbitrary scale
random_page_cost = 0.5            # same scale as above
cpu_tuple_cost = 0.007            # same scale as above
#cpu_index_tuple_cost = 0.005        # same scale as above
#cpu_operator_cost = 0.0025        # same scale as above
#effective_cache_size = 128MB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5            # range 1-10
#geqo_pool_size = 0            # selects default based on effort
#geqo_generations = 0            # selects default based on effort
#geqo_selection_bias = 2.0        # range 1.5-2.0

# - Other Planner Options -

default_statistics_target = 25        # range 1-10000
constraint_exclusion = partition    # on, off, or partition
cursor_tuple_fraction = 0.05        # range 0.0-1.0
from_collapse_limit = 8
join_collapse_limit = 8            # 1 disables collapsing of explicit
                    # JOIN clauses



------------
pasman

Re: Query uses incorrect index

От
"Kevin Grittner"
Дата:
pasman pasma*ski<pasman.p@gmail.com> wrote:

>                     ->  Index Scan using NTA_5" on "NumeryA" a
> (cost=0.01..10016.75 rows=24175 width=42) (actual
> time=0.132..308.018 rows=33597 loops=1)"

> seq_page_cost = 0.3
> random_page_cost = 0.5

Your data is heavily cached (to be able to read 33597 rows randomly
through an index in 308 ms), yet you're telling the optimizer that a
random access is significantly more expensive than a sequential one.
Try this in your session before running the query (with all indexes
present):

set seq_page_cost = 0.1;
set random_page_cost = 0.1;

I don't know if the data for all your queries is so heavily cached
-- if so, you might want to change these settings in your
postgresql.conf file.

-Kevin

Re: Query uses incorrect index

От
pasman pasmański
Дата:
Thanks for reply.
I tested random changes and query runs fastest after:

set seq_page_cost = 0.1;
set random_page_cost = 0.1;
cpu_operator_cost = 0.01



------------
pasman

Re: Query uses incorrect index

От
Guillaume Cottenceau
Дата:
pasman pasmański <pasman.p 'at' gmail.com> writes:

> Thanks for reply.
> I tested random changes and query runs fastest after:
>
> set seq_page_cost = 0.1;
> set random_page_cost = 0.1;
> cpu_operator_cost = 0.01

If I'm correct, you're basically telling postgresql that your
disk is unusually fast compared to your CPU. Even if some queries
will run faster from a side-effect of these settings, you're
likely to create other random problems...

--
Guillaume Cottenceau

Re: Query uses incorrect index

От
"Kevin Grittner"
Дата:
Guillaume Cottenceau  wrote:

> If I'm correct, you're basically telling postgresql that your
> disk is unusually fast compared to your CPU. Even if some queries
> will run faster from a side-effect of these settings, you're
> likely to create other random problems...

If this is set globally and the active portion of the database is not
highly cached, yes.  If the example query is typical of the level of
caching for frequently-run queries, it might provide an overall
performance boost to set these in postgresql.conf.

The original problem was that the optimizer was grossly
over-estimating the cost of returning a tuple through the index,
which was taking about 0.01 ms per tuple.

-Kevin

Re: Query uses incorrect index

От
Tom Lane
Дата:
=?ISO-8859-2?Q?pasman_pasma=F1ski?= <pasman.p@gmail.com> writes:
> Thanks for reply.
> I tested random changes and query runs fastest after:

> set seq_page_cost = 0.1;
> set random_page_cost = 0.1;
> cpu_operator_cost = 0.01

As a general rule, "optimizing" those settings on the basis of testing a
single query is a great way to send your overall performance into the
tank --- especially since repeating a single query will be heavily
biased by cache effects.  You need to look at a representative sample of
all your queries across all your data.

            regards, tom lane

Re: Query uses incorrect index

От
pasman pasmański
Дата:
Hi.

I install auto_explain module for monitoring queries.
By the way, is any tool to tune planner automatically ?


------------
pasman