Обсуждение: BUG #18129: GiST index produces incorrect query results

Поиск
Список
Период
Сортировка

BUG #18129: GiST index produces incorrect query results

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18129
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16.0
Operating system:   Ubuntu 22.04
Description:

The following test data manipulation:
sed '/{73,23,20}/!s/\([0-9]\+\)/\1,10\1,20\1,30\1/g' -i
contrib/intarray/data/test__int.data
(effectively multiplying contents of all arrays (except for one) in the
test by 3)

makes `make check contrib/intarray` fail, with random results, e.g.:
--- .../contrib/intarray/expected/_int.out        2023-09-21
21:44:46.967470822 +0300
+++ .../contrib/intarray/results/_int.out 2023-09-22 08:12:18.269739402
+0300
@@ -497,13 +497,13 @@
 SELECT count(*) from test__int WHERE a && '{23,50}';
  count 
 -------
-   403
+   395
 (1 row)
 
 SELECT count(*) from test__int WHERE a @@ '23|50';
  count 
 -------
-   403
+   395
 (1 row)
 
 SELECT count(*) from test__int WHERE a @> '{23,50}';
@@ -557,13 +557,13 @@
 SELECT count(*) from test__int WHERE a @@ '20 | !21';
  count 
 -------
-  6566
+  6478
 (1 row)
 
 SELECT count(*) from test__int WHERE a @@ '!20 & !21';
  count 
 -------
-  6343
+  6258
 (1 row)
 
 INSERT INTO test__int SELECT array(SELECT x FROM generate_series(1, 1001)
x); -- should fail
@@ -639,13 +639,13 @@
 SELECT count(*) from test__int WHERE a @@ '20 | !21';
  count 
 -------
-  6566
+  6381
 (1 row)
 
 SELECT count(*) from test__int WHERE a @@ '!20 & !21';
  count 
 -------
-  6343
+  6165
 (1 row)
 
 DROP INDEX text_idx;

(Here we deal with intarray's gist__int_ops, but I suspect that other gist
opclasses might be affected as well.)

First bad commit is 9155580fd, on which three different outcomes are
possible: test passes/test fails/assertion fails.
Assertion failures, that we had observed after 9155580fd, were fixed by
a7ee7c851 and 741b88435, but apparently there is another defect induced
by that gist implementation change.


Re: BUG #18129: GiST index produces incorrect query results

От
Heikki Linnakangas
Дата:
On 22/09/2023 09:00, PG Bug reporting form wrote:
> First bad commit is 9155580fd, on which three different outcomes are
> possible: test passes/test fails/assertion fails.
> Assertion failures, that we had observed after 9155580fd, were fixed by
> a7ee7c851 and 741b88435, but apparently there is another defect induced
> by that gist implementation change.

I can reproduce this. I suspect something's still wrong with the LSN <-> 
NSN cross check, when the pages are WAL logged in physical order. Thanks 
for the report, I'll take a look.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: BUG #18129: GiST index produces incorrect query results

От
Heikki Linnakangas
Дата:
On 22/09/2023 09:00, PG Bug reporting form wrote:
> First bad commit is 9155580fd, on which three different outcomes are
> possible: test passes/test fails/assertion fails.
> Assertion failures, that we had observed after 9155580fd, were fixed by
> a7ee7c851 and 741b88435, but apparently there is another defect induced
> by that gist implementation change.

This is indeed a similar bug to that fixed in 741b88435 and a7ee7c851. 
In those commits, we fixed the insertion code so that whenever we insert 
a downlink to the parent page and that causes the parent page to split, 
we invalidate the 'downlinkoffnum' we had memorized for the child. 
Because when the parent page is split, the downlink for the child likely 
moves.

However, the downlink can move even when we *don't* split the parent. 
Splitting a page not only inserts the new tuple for the new right page, 
but also updates the old downlink, because the key representing what is 
now the left page covers less keys than the original page. And updating 
a tuple is implemented as delete+insert. The delete moves existing 
tuples on the page! In other words, this code and comment in 
gistfinishsplit():

>     if (gistinserttuples(state, stack->parent, giststate,
>                          tuples, 2,
>                          stack->downlinkoffnum,
>                          left->buf, right->buf,
>                          true,    /* Unlock parent */
>                          unlockbuf    /* Unlock stack->buffer if caller wants
>                                      * that */
>                          ))
>     {
>         /*
>          * If the parent page was split, the downlink might have moved.
>          */
>         stack->downlinkoffnum = InvalidOffsetNumber;
>     }

is wrong. We must invalidate 'downlinkoffnum' whether or not the parent 
page was split, because updating an existing tuple can also move other 
tuples on the page.

Attached is a set of patches to fix this:

1. To catch the problem earlier - already during the index build - I 
modified getFindCorrectParent() to always find the parent the hard way 
by scanning the parent page, even if 'downlinkoffnum' is set. Once the 
downlink is found, it compares it with the 'downlinkoffnum', and PANICs 
if it wasn't where we expected. That defeats the point of memorizing the 
'downlinkoffnum' in the first place, but it's a useful cross-check.

I don't intend to commit this, but it was useful during debugging.

2. A straightforward fix for the bug.

3. Looking at gistFindCorrectParent, when it steps right, it also 
doesn't update 'downlinkoffnum' of the parent page. Surely if we step to 
a different block altogether, the memorized downlink position of the 
previous block no longer applies.

Worryingly, I haven't been able to get a test failure caused by that. 
Maybe there are some reasons why we never need to split the parent after 
that, not sure. But it sure looks wrong.

4. Given how many bugs we've already had with this, I'd like to make 
this more robust. Currently, we meticulously track if we have made any 
inserts that invalidate the memorized location of the downlink. Instead, 
gistFindCorrectParent() could treat the memorized location as just a 
hint, and always check if the downlink is still at the memorized 
location. If it is, great, and if it's not, find it the hard way. That 
makes it unnecessary to clear 'downlinkoffnum' at the right places. We 
still need the 'retry_from_parent' flag, though.

5. Re-indent the code after previous commit. Separated for easier review.

I'd also like to add something to our regression test suite for better 
coverage of this. The intarray test case is very useful, but it'd be 
nice to have something in the main regression suite, as this is a 
general GiST issue, not related to intarray. I haven't spent any time on 
that yet.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: BUG #18129: GiST index produces incorrect query results

От
Heikki Linnakangas
Дата:
On 24/09/2023 01:06, Heikki Linnakangas wrote:
> On 22/09/2023 09:00, PG Bug reporting form wrote:
>> First bad commit is 9155580fd, on which three different outcomes are
>> possible: test passes/test fails/assertion fails.
>> Assertion failures, that we had observed after 9155580fd, were fixed by
>> a7ee7c851 and 741b88435, but apparently there is another defect induced
>> by that gist implementation change.
> 
> This is indeed a similar bug to that fixed in 741b88435 and a7ee7c851.
> In those commits, we fixed the insertion code so that whenever we insert
> a downlink to the parent page and that causes the parent page to split,
> we invalidate the 'downlinkoffnum' we had memorized for the child.
> Because when the parent page is split, the downlink for the child likely
> moves.
> 
> However, the downlink can move even when we *don't* split the parent.
> Splitting a page not only inserts the new tuple for the new right page,
> but also updates the old downlink, because the key representing what is
> now the left page covers less keys than the original page. And updating
> a tuple is implemented as delete+insert. The delete moves existing
> tuples on the page! In other words, this code and comment in
> gistfinishsplit():
> 
>>     if (gistinserttuples(state, stack->parent, giststate,
>>                          tuples, 2,
>>                          stack->downlinkoffnum,
>>                          left->buf, right->buf,
>>                          true,    /* Unlock parent */
>>                          unlockbuf    /* Unlock stack->buffer if caller wants
>>                                      * that */
>>                          ))
>>     {
>>         /*
>>          * If the parent page was split, the downlink might have moved.
>>          */
>>         stack->downlinkoffnum = InvalidOffsetNumber;
>>     }
> 
> is wrong. We must invalidate 'downlinkoffnum' whether or not the parent
> page was split, because updating an existing tuple can also move other
> tuples on the page.
> 
> Attached is a set of patches to fix this:
> 
> 1. To catch the problem earlier - already during the index build - I
> modified getFindCorrectParent() to always find the parent the hard way
> by scanning the parent page, even if 'downlinkoffnum' is set. Once the
> downlink is found, it compares it with the 'downlinkoffnum', and PANICs
> if it wasn't where we expected. That defeats the point of memorizing the
> 'downlinkoffnum' in the first place, but it's a useful cross-check.
> 
> I don't intend to commit this, but it was useful during debugging.
> 
> 2. A straightforward fix for the bug.
> 
> 3. Looking at gistFindCorrectParent, when it steps right, it also
> doesn't update 'downlinkoffnum' of the parent page. Surely if we step to
> a different block altogether, the memorized downlink position of the
> previous block no longer applies.
> 
> Worryingly, I haven't been able to get a test failure caused by that.
> Maybe there are some reasons why we never need to split the parent after
> that, not sure. But it sure looks wrong.
> 
> 4. Given how many bugs we've already had with this, I'd like to make
> this more robust. Currently, we meticulously track if we have made any
> inserts that invalidate the memorized location of the downlink. Instead,
> gistFindCorrectParent() could treat the memorized location as just a
> hint, and always check if the downlink is still at the memorized
> location. If it is, great, and if it's not, find it the hard way. That
> makes it unnecessary to clear 'downlinkoffnum' at the right places. We
> still need the 'retry_from_parent' flag, though.
> 
> 5. Re-indent the code after previous commit. Separated for easier review.
> 
> I'd also like to add something to our regression test suite for better
> coverage of this. The intarray test case is very useful, but it'd be
> nice to have something in the main regression suite, as this is a
> general GiST issue, not related to intarray. I haven't spent any time on
> that yet.
I spent a while trying to create a test case for this that would not 
require expanding the test data so much, but no avail. I even tried to 
take the same data set, but instead of duplicating each element like you 
did, I appended a random number of integers to each array, but even that 
did not trigger the failure. That particular data set seems cursed; how 
did you stumble upon it?

I'm reluctant to just make all the arrays larger, as it makes the test 
2x slower. So instead of running all the commands over the larger data 
set, I added one more copy of the CREATE INDEX and the test queries to 
the end that uses the larger set.

See attached. It's squashed version of the previous patches I posted, 
with the test case. Barring objections, I'll commit this.


While I was playing with this, I ran into this unrelated issue:

CREATE EXTENSION pgcrypto; -- for gen_random_bytes
CREATE EXTENSION btree_gist;

CREATE TABLE byteatest (a bytea);
INSERT INTO byteatest SELECT (gen_random_bytes(1000) || 
gen_random_bytes(1000) || gen_random_bytes(1000)) FROM 
generate_series(1, 10) g;
CREATE INDEX ON byteatest USING GIST ( a );

Fails with:

ERROR:  failed to add item to index page in "byteatest_a_idx1"

If you make that value even larger, then it fails with a better error 
message:

ERROR:  index row requires 8208 bytes, maximum size is 8191

So the "failed to add item" error doesn't seem expected.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: BUG #18129: GiST index produces incorrect query results

От
Alexander Lakhin
Дата:
Hello Heikki,

26.09.2023 00:24, Heikki Linnakangas wrote:
> I spent a while trying to create a test case for this that would not require expanding the test data so much, but no

> avail. I even tried to take the same data set, but instead of duplicating each element like you did, I appended a 
> random number of integers to each array, but even that did not trigger the failure. That particular data set seems 
> cursed; how did you stumble upon it?
>
> I'm reluctant to just make all the arrays larger, as it makes the test 2x slower. So instead of running all the 
> commands over the larger data set, I added one more copy of the CREATE INDEX and the test queries to the end that
uses
 
> the larger set.
>
> See attached. It's squashed version of the previous patches I posted, with the test case. Barring objections, I'll 
> commit this.

Thank you for working on this!

I can't review the code change in depth, unfortunately (I need time to
study gist guts), but I've tried to just test it.
And with the patch applied I get an assertion failure when the server
compiled --with-blocksize=1 (and -O0):
CPPFLAGS="-O0" ./configure -q --with-blocksize=1 --enable-debug --enable-cassert && make -s -j8 && make -s -j8 -C 
contrib && make -s check -C contrib/intarray
not ok 1     - _int                                      806 ms
# (test process exited with exit code 2)
^C# could not stop postmaster: exit code was 2

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140373756303168) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140373756303168) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140373756303168, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fab4f409476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fab4f3ef7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000559563a87ed9 in ExceptionalCondition (conditionName=0x559563b17a28 "ItemIdHasStorage(itemId)", 
fileName=0x559563b17990 "../../../../src/include/storage/bufpage.h", lineNumber=355) at assert.c:66
#6  0x00005595633334fb in PageGetItem (page=0x7fab43d15800 "", itemId=0x7fab43d1581c) at 
../../../../src/include/storage/bufpage.h:355
#7  0x00005595633358e9 in gistFindCorrectParent (r=0x7fab4176b118, child=0x559565b45b18, is_build=true) at gist.c:1063
#8  0x00005595633361b5 in gistfinishsplit (state=0x7ffc087cd250, stack=0x559565b45b18, giststate=0x559565b7ecd8, 
splitinfo=0x559565be0290, unlockbuf=false) at gist.c:1392
#9  0x0000559563335fdb in gistinserttuples (state=0x7ffc087cd250, stack=0x559565b45b18, giststate=0x559565b7ecd8, 
tuples=0x559565b6b5d8, ntup=1, oldoffnum=0, leftchild=12421, rightchild=12422, unlockbuf=false, unlockleftchild=false)
     at gist.c:1322
#10 0x000055956333612c in gistfinishsplit (state=0x7ffc087cd250, stack=0x559565b45fc8, giststate=0x559565b7ecd8, 
splitinfo=0x559565b6b560, unlockbuf=false) at gist.c:1368
#11 0x0000559563335fdb in gistinserttuples (state=0x7ffc087cd250, stack=0x559565b45fc8, giststate=0x559565b7ecd8, 
tuples=0x559565b692a8, ntup=1, oldoffnum=0, leftchild=12419, rightchild=12420, unlockbuf=false, unlockleftchild=false)
     at gist.c:1322
....

(in fact it crashes even on `make check`)

I haven't understand the reason yet, just wanted to let you know that maybe
the patch needs one more fix before committing.

>
> ERROR:  failed to add item to index page in "byteatest_a_idx1"
>
> If you make that value even larger, then it fails with a better error message:
>
> ERROR:  index row requires 8208 bytes, maximum size is 8191
>
> So the "failed to add item" error doesn't seem expected.
>

Yeah, I saw that anomaly too.

Best regards,
Alexander




Re: BUG #18129: GiST index produces incorrect query results

От
Heikki Linnakangas
Дата:
I just noticed that pgsql-bugs got dropped from this thread, so I'm 
resending this with pgsql-bugs CC'd

On 26/09/2023 13:10, Alexander Lakhin wrote:
> 26.09.2023 11:08, Alexander Lakhin wrote:
>>
>> And with the patch applied I get an assertion failure when the server
>> compiled --with-blocksize=1 (and -O0):
>> CPPFLAGS="-O0" ./configure -q --with-blocksize=1 --enable-debug --enable-cassert && make -s -j8 && make -s -j8 -C
>> contrib && make -s check -C contrib/intarray
>> not ok 1     - _int                                      806 ms
>> # (test process exited with exit code 2)
>> ^C# could not stop postmaster: exit code was 2
>>
>> (gdb) bt
>> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140373756303168) at ./nptl/pthread_kill.c:44
>> #1  __pthread_kill_internal (signo=6, threadid=140373756303168) at ./nptl/pthread_kill.c:78
>> #2  __GI___pthread_kill (threadid=140373756303168, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
>> #3  0x00007fab4f409476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
>> #4  0x00007fab4f3ef7f3 in __GI_abort () at ./stdlib/abort.c:79
>> #5  0x0000559563a87ed9 in ExceptionalCondition (conditionName=0x559563b17a28 "ItemIdHasStorage(itemId)",
>> fileName=0x559563b17990 "../../../../src/include/storage/bufpage.h", lineNumber=355) at assert.c:66
>>
>> (in fact it crashes even on `make check`)
> 
> My note about "-O0" was excessive — I see the failure even with "-O3", it
> just occurs not every time.
> And as to make check - that was the same issue as I reported before (bug
> #18127).
> Sorry for the noise.
> 
> As to the assertion failure, maybe it's explained by the line
> maxoff = PageGetMaxOffsetNumber(parent->page);
> removed inside the while loop (I guess that maxoff recalculation is needed
> as parent->page changes inside the loop).

Yes, you're right. Fixed that, and pushed (commit 28d3c2ddcf). Thanks 
for the report and debugging!

-- 
Heikki Linnakangas
Neon (https://neon.tech)