Обсуждение: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

Поиск
Список
Период
Сортировка

nbtree's ScalarArrayOp array mark/restore code appears to be buggy

От
Peter Geoghegan
Дата:
Attached test case demonstrates an issue with nbtree's mark/restore
code. All supported versions are affected.

My suspicion is that bugfix commit 70bc5833 missed some subtlety
around what we need to do to make sure that the array keys stay "in
sync" with the scan. I'll have time to debug the problem some more
tomorrow.

My ScalarArrayOp project [1] seems unaffected by the bug, so I don't
expect it'll take long to get to the bottom of this. This is probably
due to its general design, and not any specific detail. The patch
makes the relationship between the current scan position and the
current array keys a great deal looser.

[1] https://commitfest.postgresql.org/44/4455/
-- 
Peter Geoghegan

Вложения

Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

От
Peter Geoghegan
Дата:
On Fri, Sep 22, 2023 at 8:17 PM Peter Geoghegan <pg@bowt.ie> wrote:
> My suspicion is that bugfix commit 70bc5833 missed some subtlety
> around what we need to do to make sure that the array keys stay "in
> sync" with the scan. I'll have time to debug the problem some more
> tomorrow.

I've figured out what's going on here.

If I make my test case "group by" both of the indexed columns from the
composite index (either index/table will do, since it's an equijoin),
a more detailed picture emerges that hints at the underlying problem:

┌───────┬─────────┬─────────┐
│ count │ small_a │ small_b │
├───────┼─────────┼─────────┤
│ 8,192 │       1 │       2 │
│ 8,192 │       1 │       3 │
│ 8,192 │       1 │       5 │
│ 8,192 │       1 │      10 │
│ 8,192 │       1 │      12 │
│ 8,192 │       1 │      17 │
│ 2,872 │       1 │      19 │
└───────┴─────────┴─────────┘
(7 rows)

The count for the final row is wrong. It should be 8,192, just like
the earlier counts for lower (small_a, small_b) groups. Notably, the
issue is limited to the grouping that has the highest sort order. That
strongly hints that the problem has something to do with "array
wraparound".

The query qual contains "WHERE small_a IN (1, 3)", so we'll "wraps
around" from cur_elem index 1 (value 3) to cur_elem index 0 (value 1),
without encountering any rows where small_a is 3 (because there aren't
any in the index). That in itself isn't the problem. The problem is
that _bt_restore_array_keys() doesn't consider wraparound. It sees
that "cur_elem == mark_elem" for all array scan keys, and figues that
it doesn't need to call _bt_preprocess_keys(). This is incorrect,
since the current set of search-type scan keys (the set most recently
output, during the last _bt_preprocess_keys() call) still have the
value "3".

The fix for this should be fairly straightforward. We must teach
_bt_restore_array_keys() to distinguish "past the end of the array"
from "after the start of the array", so that doesn't spuriously skip a
required call to _bt_preprocess_keys() . I already see that the
problem goes away once _bt_restore_array_keys() is made to call
_bt_preprocess_keys() unconditionally, so I'm already fairly confident
that this will work.

--
Peter Geoghegan



Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

От
Peter Geoghegan
Дата:
On Sat, Sep 23, 2023 at 11:47 AM Peter Geoghegan <pg@bowt.ie> wrote:
> The fix for this should be fairly straightforward. We must teach
> _bt_restore_array_keys() to distinguish "past the end of the array"
> from "after the start of the array", so that doesn't spuriously skip a
> required call to _bt_preprocess_keys() . I already see that the
> problem goes away once _bt_restore_array_keys() is made to call
> _bt_preprocess_keys() unconditionally, so I'm already fairly confident
> that this will work.

Attached draft patch shows how this could work.

_bt_restore_array_keys() has comments that seem to suppose that
calling _bt_preprocess_keys is fairly expensive, and something that's
well worth avoiding. But...is it, really? I wonder if we'd be better
off just biting the bullet and always calling _bt_preprocess_keys
here. Could it really be such a big cost, compared to pinning and
locking the page in the btrestpos() path?

The current draft SAOP patch calls _bt_preprocess_keys() with a buffer
lock held. This is very likely unsafe, so I'll need to come up with a
principled approach (e.g. a stripped down, specialized version of
_bt_preprocess_keys that's safe to call while holding a lock seems doable).
I've been able to put that off for a few months now because it just
doesn't impact my microbenchmarks to any notable degree (and not just
in those cases where we can use the "numberOfKeys == 1" fast path at
the start of _bt_preprocess_keys). This at least suggests that the cost of
always calling _bt_preprocess_keys in _bt_restore_array_keys might be
acceptable.

--
Peter Geoghegan

Вложения

Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

От
Peter Geoghegan
Дата:
On Sat, Sep 23, 2023 at 4:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached draft patch shows how this could work.
>
> _bt_restore_array_keys() has comments that seem to suppose that
> calling _bt_preprocess_keys is fairly expensive, and something that's
> well worth avoiding. But...is it, really? I wonder if we'd be better
> off just biting the bullet and always calling _bt_preprocess_keys
> here.

My current plan is to commit something close to my original patch on
Wednesday or Thursday. My proposed fix is minimally invasive (it still
allows _bt_restore_array_keys to avoid most calls to
_bt_preprocess_keys), so I don't see any reason to delay acting here.

--
Peter Geoghegan