Re: Really bad blowups with hash outer join and nulls

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Really bad blowups with hash outer join and nulls
Дата
Msg-id 54E2025D.8070505@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Really bad blowups with hash outer join and nulls  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список pgsql-hackers
On 16.2.2015 03:38, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com>
>>>>>> writes:
>
> Tomas> Improving the estimates is always good, but it's not going
> to Tomas> fix the case of non-NULL values (it shouldn't be all
> that Tomas> difficult to create such examples with a value whose
> hash starts Tomas> with a bunch of zeroes).
>
> Right now, I can't get it to plan such an example, because (a) if
> there are no stats to work from then the planner makes fairly
> pessimistic assumptions about hash bucket filling, and (b) if
> there _are_ stats to work from, then a frequently-occurring
> non-null value shows up as an MCV and the planner takes that into
> account to calculate bucketsize.
>
> The problem could only be demonstrated for NULLs because the
> planner was ignoring NULL for the purposes of estimating
> bucketsize, which is correct for all join types except RIGHT and
> FULL (which, iirc, are more recent additions to the hashjoin
> repertoire).

Oh, right, the estimate fix is probably sufficient then.

>
> If you want to try testing it, you may find this useful:
>
> select i, hashint8(i) from unnest(array[1474049294, -1779024306,
-1329041947]) u(i);
> i      | hashint8 -------------+---------- 1474049294 |        0
> -1779024306 |        0 -1329041947 |        0 (3 rows)
>
> (those are the only three int4 values that hash to exactly 0)
>
> It's probably possible to construct pathological cases by finding
> a lot of different values with zeros in the high bits of the hash,
> but that's something that wouldn't be likely to happen by chance.

Yeah, it's probably possible, but it's admittedly considerably harder
than I initially thought. For example it could be possible to create
the table with no MCV values but sorted so that all the initial values
have hashvalue=0, triggering (growEnabled=false). But that's rather
unlikely to happen in practice I guess.

A more likely failure scenario is a hash join higher up the plan,
processing results of other joins etc. In that case the estimates will
be tricky, although the planner chooses quite pessimistic defaults in
those cases.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thom Brown
Дата:
Сообщение: Re: GSoC 2015 - mentors, students and admins.
Следующее
От: Dmitry Dolgov
Дата:
Сообщение: Re: GSoC 2015 - mentors, students and admins.