Re: Missing update of all_hasnulls in BRIN opclasses

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Missing update of all_hasnulls in BRIN opclasses
Дата
Msg-id a18e4ec1-86d7-8250-90ee-3337a382c0bb@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Missing update of all_hasnulls in BRIN opclasses  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Missing update of all_hasnulls in BRIN opclasses  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Thanks Justin! I've applied all the fixes you proposed, and (hopefully)
improved a couple other comments.

I've been working on this over the past couple days, trying to polish
and commit it over the weekend - both into master and backbranches.
Sadly, the backpatching part turned out to be a bit more complicated
than I expected, because of the BRIN reworks in PG14 (done by me, as
foundation for the new opclasses, so ... well).

Anyway, I got it done, but it's a bit uglier than I hoped for and I
don't feel like pushing this on Sunday midnight. I think it's correct,
but maybe another pass to polish it a bit more is better.

So here are two patches - one for 11-13, the other for 14-master.

There's also a separate patch with pageinspect tests, but only as a
demonstration of various (non)broken cases, not for commit. And then
also a bash script generating indexes with random data, randomized
summarization etc. - on unpatched systems this happens to fail in about
1/3 of the runs (at least for me). I haven't seen any failures with the
patches attached (on any branch).

As for the issue / fix, I don't think there's a better solution than
what the patch does - we need to distinguish empty / all-nulls ranges,
but we can't add a flag because of on-disk format / ABI. So using the
existing flags seems like the only option - I haven't heard any other
ideas so far, and I couldn't come up with any myself either.

I've also thought about alternative "encodings" into allnulls/hasnulls,
instead of treating (true,true) as "empty" - but none of that ended up
being any simpler, quite the opposite actually, as it would change what
the individual flags mean etc. So AFAICS this is the best / least
disruptive option.

I went over all the places touching these flags, to double check if any
of those needs some tweaks (similar to union_tuples, which I missed for
a long time). But I haven't found anything else, so I think this version
of the patches is complete.

As for assessing how many indexes are affected - in principle, any index
on columns with NULLs may be broken. But it only matters if the index is
used for IS NULL queries, other queries are not affected.

I also realized that this only affects insertion of individual tuples
into existing all-null summaries, not "bulk" summarization that sees all
values at once. This happens because in this case add_values_to_range
sets hasnulls=true for the first (NULL) value, and then calls the
addValue procedure for the second (non-NULL) one, which resets the
allnulls flag to false.

But when inserting individual rows, we first set hasnulls=true, but
brin_form_tuple ignores that because of allnulls=true. And then when
inserting the second row, we start with hasnulls=false again, and the
opclass quietly resets the allnulls flag.

I guess this further reduces the number of broken indexes, especially
for data sets with small null_frac, or for append-only (or -mostly)
tables where most of the summarization is bulk.

I still feel a bit uneasy about this, but I think the patch is solid.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Fixing a couple of buglets in how VACUUM sets visibility map bits