Re: [PERFORM] Hash Anti Join performance degradation

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [PERFORM] Hash Anti Join performance degradation
Дата
Msg-id BANLkTi=6pGhGOUddirQmEqavJ3_T-y4xFA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] Hash Anti Join performance degradation  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PERFORM] Hash Anti Join performance degradation  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [PERFORM] Hash Anti Join performance degradation  (panam <panam@gmx.net>)
Список pgsql-hackers
On Wed, Jun 1, 2011 at 4:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jun 1, 2011 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Because of the way that a bitmap heap scan works, the rows are
>>> guaranteed to be loaded into the hash table in physical order, which
>>> means (in the fast case) that the row with the largest "id" value gets
>>> loaded last.  And because ExecHashTableInsert pushes each row onto the
>>> front of its hash chain, that row ends up at the front of the hash
>>> chain.  Net result: for all the outer rows that aren't the one with
>>> maximum id, we get a joinqual match at the very first entry in the hash
>>> chain.  Since it's an antijoin, we then reject that outer row and go
>>> on to the next.  The join thus ends up costing us only O(N) time.
>
>> Ah!  Make sense.  If I'm reading your explanation right, this means
>> that we could have hit a similar pathological case on a nestloop as
>> well, just with a data ordering that is the reverse of what we have
>> here?
>
> Yeah.  It's just chance that this particular data set, with this
> particular ordering, happens to work well with a nestloop version
> of the query.  On average I'd expect nestloop to suck even more,
> because of more per-inner-tuple overhead.

I guess the real issue here is that m1.id < m2.id has to be evaluated
as a filter condition rather than a join qual.  That tends to perform
poorly in general, which is why rewriting this using min() or max() or
ORDER BY .. LIMIT 1 was elsewhere recommended.  I've occasionally had
cause to join on something other than equality in cases not
susceptible to such rewriting, so it would be neat to improve this
case, but it's not likely to make it to the top of my list any time
soon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PERFORM] Hash Anti Join performance degradation
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PERFORM] Hash Anti Join performance degradation