Обсуждение: Another "why am I not using the indicies?"

Поиск
Список
Период
Сортировка

Another "why am I not using the indicies?"

От
"Clinton James"
Дата:
Here is all the information about the system and the steps I took.  If you
can tell me or point me in the right direction I would appreciate it. There
are 624306 records in this table, so it only takes a few seconds, but this
is going to grow to 5M rows/month when this works properly.

Clinton James

#select version();
                               version
----------------------------------------------------------------------
 PostgreSQL 7.1rc4 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)

#CREATE TABLE ld (
  billmonth CHAR(4),
  cust_num  CHAR(8),
  location  CHAR(4),
  date      TIMESTAMP,
  orignum   CHAR(10),
  destnum   CHAR(13),
  destcity  CHAR(10),
  deststate CHAR(2),
  duration  NUMERIC(8,1),
  callcode  CHAR(9),
  cost      NUMERIC(10,4)
);

#copy ld from '/tmp/ld0104.txt';

#create index ld_custnum_idx ON ld(cust_num);
#create index ld_orignum_idx ON ld(orignum);

#vacuum analyze;

# explain select * from ld where cust_num = '10102899';
NOTICE:  QUERY PLAN:

Seq Scan on ld  (cost=0.00..20810.83 rows=8989 width=128)


Re: Another "why am I not using the indicies?"

От
Tom Lane
Дата:
"Clinton James" <cjames@callone.net> writes:
> # explain select * from ld where cust_num = '10102899';
> NOTICE:  QUERY PLAN:

> Seq Scan on ld  (cost=0.00..20810.83 rows=8989 width=128)

The reason it's not using an indexscan is that it thinks it's going to
find 8989 matches (and if that were true, a seqscan probably *is* the
right plan).  You must have some extremely-often-repeated entries in
that column; what are they, and can you get rid of them?

            regards, tom lane

Re: Another "why am I not using the indicies?"

От
Tom Lane
Дата:
"Clinton James" <cjames@callone.net> writes:
> True, using a different cust_num, it is possible there may be a couple of
> cust_num where that would be true.  The majority will not have even near
> that many. In this case there are only 13 matching records. Is VACUUMDB
> ANALYZE using the worst case (the cust_num with the greatest number of rows)
> senario for this?

Yes, because that's as far as the statistics go at the moment :-(.
The frequency of the most common value is basically all the info the
planner has about the data distribution, and so a highly skewed
distribution leads to bad plan choices.

I am presently working on better statistics ... to appear in 7.2 ...

            regards, tom lane