Обсуждение: Help on query plan. (was: select like and indexes)

Поиск
Список
Период
Сортировка

Help on query plan. (was: select like and indexes)

От
"William N. Zanatta"
Дата:
   Well guys,

     I made a search on the archives and found a message on my issue.
It's subject is: "Why won't the query planner use my index?".

     There was a suggestion of setting 'enable_seqscan' to off to see
whether the query planner was chosing the right way or not.

     Here goes my results:

access=# explain analyze select * from tbl_access where ip like '12%157';
                                                     QUERY PLAN
-------------------------------------------------------------------
  Seq Scan on tbl_access  (cost=0.00..42519.84 rows=139 width=134)
(actual time=698.03..20504.07 rows=1391 loops=1)
    Filter: (ip ~~ '12%157'::text)
  Total runtime: 20507.44 msec
(3 rows)

-------------------------------------------------------------------

access=# SET enable_seqscan = off;
SET
access=# explain analyze select * from tbl_access where ip like '12%157';
                                                     QUERY PLAN

-------------------------------------------------------------------
  Index Scan using teste1 on tbl_access  (cost=0.00..63593.03 rows=139
width=134) (actual time=160.69..1177.26 rows=1391 loops=1)
    Index Cond: ((ip >= '12'::character varying) AND (ip <
'13'::character varying))
    Filter: (ip ~~ '12%157'::text)
  Total runtime: 1181.18 msec
(4 rows)


--------------------------------------------------------------------

   Well, although PG is chosing for a Sequential Scan, the Index Scan
showed to be much more faster.

   Is that a problem with the planner or a normal behavior?

william

--
Perl combines all of the worst aspects of BASIC, C and line noise.
                 -- Keith Packard


Re: Help on query plan. (was: select like and indexes)

От
Stephan Szabo
Дата:
On Mon, 20 Jan 2003, William N. Zanatta wrote:

>
> access=# explain analyze select * from tbl_access where ip like '12%157';
>                                                      QUERY PLAN
> -------------------------------------------------------------------
>   Seq Scan on tbl_access  (cost=0.00..42519.84 rows=139 width=134)
> (actual time=698.03..20504.07 rows=1391 loops=1)
>     Filter: (ip ~~ '12%157'::text)
>   Total runtime: 20507.44 msec
> (3 rows)
>
> -------------------------------------------------------------------
>
> access=# SET enable_seqscan = off;
> SET
> access=# explain analyze select * from tbl_access where ip like '12%157';
>                                                      QUERY PLAN
>
> -------------------------------------------------------------------
>   Index Scan using teste1 on tbl_access  (cost=0.00..63593.03 rows=139
> width=134) (actual time=160.69..1177.26 rows=1391 loops=1)
>     Index Cond: ((ip >= '12'::character varying) AND (ip <
> '13'::character varying))
>     Filter: (ip ~~ '12%157'::text)
>   Total runtime: 1181.18 msec
> (4 rows)
>
>


Okay, this shows that you are already in "C" locale, since otherwise it
wouldn't try it at all.  Have you done a vacuum analyze recently?
What does vacuum analyze verbose tbl_access; give you?

The real rows and estimated rows seem far enough off that it might be
getting confused as to which plan is best.


Re: Help on query plan. (was: select like and indexes)

От
Tom Lane
Дата:
"William N. Zanatta" <william@veritel.com.br> writes:
> access=# explain analyze select * from tbl_access where ip like '12%157';
>                                                      QUERY PLAN

> -------------------------------------------------------------------
>   Index Scan using teste1 on tbl_access  (cost=0.00..63593.03 rows=139
> width=134) (actual time=160.69..1177.26 rows=1391 loops=1)
>     Index Cond: ((ip >= '12'::character varying) AND (ip <
> '13'::character varying))
>     Filter: (ip ~~ '12%157'::text)
>   Total runtime: 1181.18 msec
> (4 rows)

What do you get from

explain analyze select * from tbl_access where
((ip >= '12'::character varying) AND (ip < > '13'::character varying))

A deficiency in the EXPLAIN output format is that you can only see the
planner's estimate of the final number of output rows at a node --- in
this case, the "139" includes what it thinks the LIKE operator will do.
I suspect that its estimate of the number of rows that would be found
in the indexscan is far greater --- and, probably, far off.

Have you done an ANALYZE on this table lately?  How many rows in the
table altogether?

            regards, tom lane

Re: Help on query plan. (was: select like and indexes)

От
"William N. Zanatta"
Дата:
>
> Okay, this shows that you are already in "C" locale, since otherwise it
> wouldn't try it at all.  Have you done a vacuum analyze recently?
> What does vacuum analyze verbose tbl_access; give you?

   I don't remember the exactly time I did it, but I run just vacuum
yesterday. Anyway here goes my vacuum analyze result:

access=# vacuum analyze verbose tbl_access;
INFO:  --Relation public.tbl_access--
INFO:  Pages 27595: Changed 0, Empty 0; Tup 1193987: Vac 0, Keep 0,
UnUsed 34276.
         Total CPU 1.78s/1.68u sec elapsed 3.51 sec.
INFO:  Analyzing public.tbl_access
VACUUM

access=# explain analyze select * from tbl_access where ip like '12%157';
                                                     QUERY PLAN
--------------------------------------------------------------------
  Seq Scan on tbl_access  (cost=0.00..42519.84 rows=136 width=133)
(actual time=646.06..20478.91 rows=1391 loops=1)
    Filter: (ip ~~ '12%157'::text)
  Total runtime: 20482.46 msec
(3 rows)

>
> The real rows and estimated rows seem far enough off that it might be
> getting confused as to which plan is best.
>
>
   yes, I see that. :(

   I set enable_seqscan = off yesterday and got the results I was
expecting. Would it be expensive to leave it disabled?

william




--
Perl combines all of the worst aspects of BASIC, C and line noise.
                 -- Keith Packard


Re: Help on query plan. (was: select like and indexes)

От
Stephan Szabo
Дата:
On Tue, 21 Jan 2003, William N. Zanatta wrote:

> > Okay, this shows that you are already in "C" locale, since otherwise it
> > wouldn't try it at all.  Have you done a vacuum analyze recently?
> > What does vacuum analyze verbose tbl_access; give you?
>
>    I don't remember the exactly time I did it, but I run just vacuum
> yesterday. Anyway here goes my vacuum analyze result:

Well, I was more interested in making sure that analyze was run, and the
vacuum was more to get the page/tuple information for reference.

>
> access=# vacuum analyze verbose tbl_access;
> INFO:  --Relation public.tbl_access--
> INFO:  Pages 27595: Changed 0, Empty 0; Tup 1193987: Vac 0, Keep 0,
> UnUsed 34276.
>          Total CPU 1.78s/1.68u sec elapsed 3.51 sec.
> INFO:  Analyzing public.tbl_access
> VACUUM
>


It might be a good idea to try the query Tom suggested to see what it's
thinking for the selectivity of the ip index over that range.  It's
possible that it's over estimating that portion (the 12*) which would make
sense if there are alot of 12* values or if there are a bunch of very
common values.  If it does appar to be over estimating the number of rows
that the index scan will return, you might want to try increasing the
number of statistics buckets (see alter table alter column set statistics)
to like 100 or 500 do another analyze on the table and see if that helps
the estimate any.


Re: Help on query plan.

От
"William N. Zanatta"
Дата:
Somebody called 'Tom Lane' tried to say something! Take a look:
>>>access=# explain analyze select * from tbl_access where ((ip >=
>>>'12'::character varying) AND (ip <> '13'::character varying))
>>>access-# ;
>>>QUERY PLAN
>>>-----------------------------------------------------------------
>>>Seq Scan on tbl_access  (cost=0.00..45504.81 rows=1193347 width=133)
>>>(actual time=59.03..84286.81 rows=1193987 loops=1)
>>>Filter: ((ip >= '12'::character varying) AND (ip <> '13'::character
>>>varying))
>>>Total runtime: 86862.12 msec
>>>(3 rows)
>
>
>>Hm, so *all* of the rows in your table have ip values starting with '12'?
>
>
> Oh, wait wait wait.  There's a typo in that explain command.  It should
> be
>
> explain analyze select * from tbl_access where ((ip >=
> '12'::character varying) AND (ip < '13'::character varying))
>
>             regards, tom lane
>
>

  Hmm ok, running it again...

explain analyze select * from tbl_access where((ip >='12'::character va
rying) AND (ip < '13'::character varying));
                                                      QUERY PLAN

--------------------------------------------------------------------------------
  Seq Scan on tbl_access  (cost=0.00..45504.81 rows=16968 width=133)
(actual time=78.64..29174.63 rows=20318 loops=1)
    Filter: ((ip >= '12'::character varying) AND (ip < '13'::character
varying))
  Total runtime: 29222.49 msec

  ====================   / ******* \   ================

   And now, the same query with enable_seqscan set to OFF.

explain analyze select * from tbl_access where((ip >='12'::character
varying) AND (ip < '13'::character varying));
                                                             QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------
  Index Scan using teste1 on tbl_access  (cost=0.00..63182.79 rows=16968
width=133) (actual time=0.91..1813.32 rows=20318 loops=1)
    Index Cond: ((ip >= '12'::character varying) AND (ip <
'13'::character varying))
  Total runtime: 1863.33 msec
(3 rows)


   Any other suggestions?

   thanks,

william

--
Perl combines all of the worst aspects of BASIC, C and line noise.
                 -- Keith Packard


Re: Help on query plan.

От
Tom Lane
Дата:
"William N. Zanatta" <william@veritel.com.br> writes:
>    And now, the same query with enable_seqscan set to OFF.

> explain analyze select * from tbl_access where((ip >='12'::character
> varying) AND (ip < '13'::character varying));
>                                                              QUERY PLAN

>
----------------------------------------------------------------------------------------------------------------------------------
>   Index Scan using teste1 on tbl_access  (cost=0.00..63182.79 rows=16968
> width=133) (actual time=0.91..1813.32 rows=20318 loops=1)
>     Index Cond: ((ip >= '12'::character varying) AND (ip <
> '13'::character varying))
>   Total runtime: 1863.33 msec

Okay, so the rows estimate isn't too far off (17k vs 20k) --- you could
doubtless make it more accurate by increasing the statistics target, but
that won't help here (the cost estimate would actually go up).

So the index is selecting about 1.7% of the table in this case.  For
randomly-scattered rows, that's usually near the threshold of where an
indexscan stops making sense.  Given the very low actual runtime of the
indexscan, I have to guess that the data is not randomly scattered but
is actually pretty well clustered --- and that the planner is failing
to account for that effect adequately.

What does pg_stats show as the correlation value for the ip column?
If you increase the statistics target and re-ANALYZE, does the
correlation value change?

            regards, tom lane

Re: Help on query plan.

От
"William N. Zanatta"
Дата:

> What does pg_stats show as the correlation value for the ip column?
> If you increase the statistics target and re-ANALYZE, does the
> correlation value change?
>
>             regards, tom lane

   well well well...let me see...I'm digging into an obscure
world...sorry I've never used a real database, just mysql heehe.

   OK, as I don't know what is much and what isn't for the stats target,
I tried to increase it firstly to 15 (10 is default according to
documentation) and then to 20. Results follow:


   No changes at all. Normal values since db build.
   ------------------------------------------------
access=# select correlation,attname from pg_stats where attname='ip';
  correlation | attname
-------------+---------
     0.202063 | ip
(1 row)


   Changing statistics target to 15
   ------------------------------------------------
access=# alter table tbl_access alter column ip set statistics 15;
ALTER TABLE
access=# analyze verbose tbl_access (ip);
INFO:  Analyzing public.tbl_access
ANALYZE
access=# select correlation,attname from pg_stats where attname='ip';
  correlation | attname
-------------+---------
      0.25932 | ip
(1 row)

   Changing statistics target to 20
   ------------------------------------------------
access=# alter table tbl_access alter column ip set statistics 20;
ALTER TABLE
access=# analyze verbose tbl_access (ip);
INFO:  Analyzing public.tbl_access
ANALYZE
access=# select correlation,attname from pg_stats where attname='ip';
  correlation | attname
-------------+---------
     0.213223 | ip
(1 row)


---------------------------------------------------------

   Is there any place I can find more about how to better analyze the
data postgresql offers me? It looks really helpfull...

   Thanks,

william


--
Perl combines all of the worst aspects of BASIC, C and line noise.
                 -- Keith Packard


Re: Help on query plan.

От
Tom Lane
Дата:
"William N. Zanatta" <william@veritel.com.br> writes:
>    OK, as I don't know what is much and what isn't for the stats target,
> I tried to increase it firstly to 15 (10 is default according to
> documentation) and then to 20. Results follow:

Try like 50 or 100.  Also, does the EXPLAIN result change with the
higher correlation figures?  I'd expect estimated cost of indexscan
to drop with higher correlation.

If you dig into the pghackers archives you will find some discussions
about changing the equation used for adjusting indexscan based on
correlation.  I suspect the real solution to your problem will be to
alter the equation.  But I'm curious about whether we have reliable
stats, first...

            regards, tom lane

Re: Help on query plan.

От
"William N. Zanatta"
Дата:
>
> Try like 50 or 100.  Also, does the EXPLAIN result change with the
> higher correlation figures?  I'd expect estimated cost of indexscan
> to drop with higher correlation.
>
> If you dig into the pghackers archives you will find some discussions
> about changing the equation used for adjusting indexscan based on
> correlation.  I suspect the real solution to your problem will be to
> alter the equation.  But I'm curious about whether we have reliable
> stats, first...
>


   Don't know what happened. Now it only shows me Index Scans, even if
it set the stats values back to 10 or less (tried 0 too). It's just
showing up Index Scans for that query.

   Now, I'm confused... =/

   what could have happened?

   william


--
Perl combines all of the worst aspects of BASIC, C and line noise.
                 -- Keith Packard