Re: [SQL] JOIN index/sequential select problem

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [SQL] JOIN index/sequential select problem
Дата
Msg-id 16163.926716133@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [SQL] JOIN index/sequential select problem  (Ole Gjerde <gjerde@icebox.org>)
Список pgsql-sql
Ole Gjerde <gjerde@icebox.org> writes:
>> How big did you say these tables were?

> The av_parts table has about 4 million rows, while inventorysuppliers only
> has ~200 rows.

Ah!  That's a critical bit of information!  It sure looks to me like the
optimizer did not know that when it was producing the plan you showed
earlier; so it used a plan that looked OK but actually was pretty awful.

> I ran vacuum again, and the query is done instantly, however the cost seem
> a bit high, no?
> Hash Join  (cost=31373.53 rows=7218 width=100)
>   ->  Index Scan using av_parts_rawpartnumber_index on av_parts  (cost=31313.53 rows=1186 width=60)
>   ->  Hash  (cost=11.93 rows=210 width=40)
>         ->  Seq Scan on inventorysuppliers  (cost=11.93 rows=210 width=40)

No, that's probably about right --- bear in mind that the optimizer is
only making guesses about how many hits there will be for an index
restriction.  Here, we can see that it's guessing 1186 hits in the
av_parts table, which is pretty darn selective for a 4-million-row
table (about 0.03%), even though really there might be only a few
matches.

Note that this doesn't have anything to do with how many rows will
actually match your given restrictionAV_Parts.RawPartNumber LIKE '6890040%'
Instead, it's an estimate based on looking at the additional
index-compatible restrictions that the parser manufactured from that
clause, namelyAV_Parts.RawPartNumber >= '6890040'ANDAV_Parts.RawPartNumber <= '6890040\377'
and trying to guess how many rows will be scanned between those lower
and upper index limits.  There's going to be an awful lot of fuzz in any
such estimate.

However, the optimizer drew the right conclusion, namely that this way
is probably cheaper than any other way.  You'll recall that before, it
was estimating only two cost units for an index scan on av_parts, which
is obviously silly for a 4-million-row table.  (The cost units are
intended to be roughly 1 unit = 1 disk block access.)  So it thought it
could get away with doing that as the inner loop of a nested-loop join.
Now that it knows scanning av_parts is *not* cheap compared to scanning
inventorysuppliers, it's making better decisions.

You are probably used to looking at small cost estimates from your
experience with small or unvacuumed tables.  Now that the optimizer knows
how big av_parts really is, all its estimates for queries involving that
table will be large --- but they're all relative anyway.
        regards, tom lane


В списке pgsql-sql по дате отправления:

Предыдущее
От: "Steven M. Wheeler"
Дата:
Сообщение: Re: pgsql-sql-digest V1 #225
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [SQL] Re: pgsql-sql-digest V1 #225