Обсуждение: Query Optimisation and TEXT fields

Поиск
Список
Период
Сортировка

Query Optimisation and TEXT fields

От
Andrew McMillan
Дата:
I am having some problems getting optimised queries when I use TEXT
fields in records.  It seems that PostgreSQL is assuming that these
fields are 4 bytes wide so the record width calculation is wrong and
this means that all of the dependant calculations are wrong.

Will it be a big deal to change teh width estimate for a record?  I see
that vacuum effectively collects this statistic already, but is it
saved?

For example, from the following vacuum we can see that the average
record size on my table is approximately:(1392*BLCKSZ)/24986 or ~ 456 bytes

NOTICE:  --Relation story--
NOTICE:  Pages 1392: Changed 0, reaped 528, Empty 0, New 0; Tup 24986:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 18161, MinLen 76, MaxLen 574;
Re-using: Free/Avail. Space 111804/104284; EndEmpty/Avail. Pages 0/376.
CPU 0.31s/3.00u sec.
NOTICE:  Index story_pkey: Pages 201; Tuples 24986: Deleted 0. CPU
0.04s/0.24u sec.


On the other hand, a basic query shows that the optimiser is estimating
only around 20% of that:

newsroom=# explain select * from story;
NOTICE:  QUERY PLAN:

Seq Scan on story  (cost=0.00..1641.86 rows=24986 width=91)



So the cost guesses are out by a factor of 5 and indexes are being used
a lot less often than I would like.  I have a query which does a reverse
indexscan when I use LIMIT 30, giving a seemingly instant response, but
switches to sequential scan and sort when I use LIMIT 35, taking around
10 seconds to return (Pentium 233).

Anyone have any thoughts on how to go about fixing this?

Regards,                Andrew McMillan
-- 
_____________________________________________________________________           Andrew McMillan, e-mail:
Andrew@cat-it.co.nz
Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington
Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267


Re: Query Optimisation and TEXT fields

От
Tom Lane
Дата:
Andrew McMillan <Andrew@catalyst.net.nz> writes:
> I am having some problems getting optimised queries when I use TEXT
> fields in records.  It seems that PostgreSQL is assuming that these
> fields are 4 bytes wide so the record width calculation is wrong and
> this means that all of the dependant calculations are wrong.

4 bytes?  I'd have expected 12 (see _DEFAULT_ATTRIBUTE_WIDTH_ as used
in src/backend/optimizer/path/costsize.c).  While this is obviously
pretty brain-dead, I have not seen many cases in which that particular
bogosity was the limiting factor in the accuracy of the optimizer's
calculations.  Usually it's the row count rather than row width that
we're hopelessly lost on :-(

At some point it might be useful for VACUUM to calculate a real
average-field-width value for varlena columns and store same in
pg_statistic.  I can't get excited about it quite yet though.
If you dig into costsize.c you'll see that the estimated row width
is just a minor factor in the estimates.  In particular, it has no
relevance whatever for seqscan-vs-indexscan choices.
        regards, tom lane