Обсуждение: Not using index

Поиск
Список
Период
Сортировка

Not using index

От
Daniele Orlandi
Дата:
Hello,

I just installed PG 7.0 and did some testing. I found what I believe is
a strange behaviour:

xtnet=# \d telecom         Table "telecom"Attribute |    Type     | Modifier 
-----------+-------------+----------regione   | varchar(20) | distretto | varchar(20) | settore   | varchar(20) |
descriz  | varchar(60) | cor_x     | float8      | cor_y     | float8      | cod_dis   | varchar(4)  | rete_urb  |
varchar(24)| 
 

xtnet=# create index telecom_regione on telecom ( regione );
CREATE

xtnet=# explain select distinct on (regione) * from telecom;
NOTICE:  QUERY PLAN:

Unique  (cost=0.00..4777.43 rows=3910 width=88) ->  Index Scan using telecom_regione on telecom  (cost=0.00..4679.68
rows=39100 width=88)

EXPLAIN

xtnet=# explain select distinct on (regione) regione from telecom;
NOTICE:  QUERY PLAN:

Unique  (cost=4370.91..4468.66 rows=3910 width=12) ->  Sort  (cost=4370.91..4370.91 rows=39100 width=12)       ->  Seq
Scanon telecom  (cost=0.00..1079.00 rows=39100
 
width=12)

EXPLAIN

xtnet=# explain select distinct on (regione) distretto,regione from
telecom;
NOTICE:  QUERY PLAN:

Unique  (cost=4492.72..4590.47 rows=3910 width=24) ->  Sort  (cost=4492.72..4492.72 rows=39100 width=24)       ->  Seq
Scanon telecom  (cost=0.00..1079.00 rows=39100
 
width=24)

EXPLAIN

It seems like the index is used only if * appears in the target list.
???

Bye!

-- Daniele

-------------------------------------------------------------------------------Daniele Orlandi - Utility Line Italia -
http://www.orlandi.comViaMezzera 29/A - 20030 - Seveso (MI) - Italy
 
-------------------------------------------------------------------------------


Re: Not using index

От
Tom Lane
Дата:
Daniele Orlandi <daniele@orlandi.com> writes:
> [ example snipped ]
> It seems like the index is used only if * appears in the target list.

There is certainly not a dependence on * as such.  However, the
estimated row width does affect the cost estimate for operations like
SORT, where we have to guess how many rows will fit in memory.  It looks
to me like your example case is right near the boundary where the system
thinks that index scan and sort are of roughly equal cost, so relatively
small changes will push the choice in one direction or the other.
This should not be taken as an indication that someone actually
considered the specific examples and decided it should act that way!
It just falls out of the behavior of the cost-estimate-driven planner
structure.  Ideally, the hope is not that this sort of choice is
perfectly right all the time; my ambition only extends to the hope that
if the system thinks index scan and sort are of roughly equal cost,
then indeed they are, and so it wouldn't matter a whole lot which one
gets picked.

Of course we aren't all that close to meeting the ideal goal :-(.
I'm assuming that you are complaining because one or the other of
these plans is actually much cheaper than the other in your example.
You have, however, carefully refrained from giving us any hint which.
Care to fess up with more details?
        regards, tom lane


Re: Not using index

От
Daniele Orlandi
Дата:
Tom Lane wrote:
> 
> There is certainly not a dependence on * as such.

Yes, that was pretty strange, but, not knowing the internals of the
optimizer, I wondered why a change in the target list could have changed
the decision of the optimizer.

> However, the estimated row width does affect the cost estimate for operations
> like SORT, where we have to guess how many rows will fit in memory.  It looks
> to me like your example case is right near the boundary where the system
> thinks that index scan and sort are of roughly equal cost, so relatively
> small changes will push the choice in one direction or the other.

Yes this is what's happening. I progressively adding attributes to the
targets list and, at some point, the optimizer choosed the other
alternative.

> I'm assuming that you are complaining because one or the other of
> these plans is actually much cheaper than the other in your example.

Yes, more than 12:1 ratio.

> You have, however, carefully refrained from giving us any hint which.
> Care to fess up with more details?

Yes, of course, I hope the following statistics can help you :

This one is the result of vacuum on the table:

NOTICE:  Pages 688: Changed 0, reaped 0, Empty 0, New 0; Tup 39100: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 100, MaxLen 201; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.04s/3.37u sec.
NOTICE:  Index telecom_settore: Pages 145; Tuples 39100. CPU 0.01s/0.07u
sec.
NOTICE:  Index telecom_distretto: Pages 156; Tuples 39100. CPU
0.01s/0.06u sec.
NOTICE:  Index telecom_regione: Pages 136; Tuples 39100. CPU 0.01s/0.06u
sec.

This one is with the index:
! system usage stats:
!    0.909758 elapsed 0.830000 user 0.050000 system sec
!    [0.870000 user 0.060000 sys total]
!    0/0 [0/0] filesystem blocks in/out
!    282/14 [429/279] page faults/reclaims, 0 [0] swaps
!    0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!    0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
!    Shared blocks:        845 read,          0 written, buffer hit rate =
97.86%
!    Local  blocks:          0 read,          0 written, buffer hit rate =
0.00%
!    Direct blocks:          0 read,          0 written

This one is without the index:
! system usage stats:
!    12.529637 elapsed 12.360000 user 0.150000 system sec
!    [12.380000 user 0.190000 sys total]
!    0/0 [0/0] filesystem blocks in/out
!    508/360 [695/657] page faults/reclaims, 0 [0] swaps
!    0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!    0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
!    Shared blocks:        688 read,          1 written, buffer hit rate =
0.72%
!    Local  blocks:          0 read,          0 written, buffer hit rate =
0.00%
!    Direct blocks:          0 read,          0 written

If you need other statistics/tests, just ask and I will be happy to help
you.

Bye!

-- Daniele

-------------------------------------------------------------------------------Daniele Orlandi - Utility Line Italia -
http://www.orlandi.comViaMezzera 29/A - 20030 - Seveso (MI) - Italy
 
-------------------------------------------------------------------------------