Обсуждение: Failures in constraints regression test, "read only 0 of 8192 bytes"

Поиск
Список
Период
Сортировка

Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
These two animals seem to have got mixed up about about the size of
this relation in the same place:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet&dt=2024-02-28%2017%3A34%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2024-03-01%2006%3A47%3A53

+++ /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/regress/results/constraints.out
2024-03-01 08:22:11.624897033 +0100
@@ -573,42 +573,38 @@
  UNIQUE (i) DEFERRABLE INITIALLY DEFERRED;
 BEGIN;
 INSERT INTO unique_tbl VALUES (1, 'five');
+ERROR:  could not read blocks 0..0 in file "base/16384/21437": read
only 0 of 8192 bytes

That error message changed slightly in my smgrreadv() commit a couple
of months ago (it would have been "block 0" and now it's "blocks 0..0"
because now we can read more than one block at a time) but I don't
immediately see how anything at that low level could be responsible
for this.



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Tomas Vondra
Дата:
These are "my" animals (running at a local university). There's a couple
interesting details:

1) the animals run on the same machine (one with gcc, one with clang)

2) I did upgrade the OS and restarted the machine on 2024/02/26, i.e.
right before the failures started

These might be just coincidences, but maybe something got broken by the
upgrade ... OTOH it's weird it'd affect just HEAD and none of the other
branches, and on two difference compilers.

Just to be sure I removed the buildroot, in case there's something wrong
with ccache. It's a wild guess, but I don't have any other idea.

regards

On 3/2/24 22:39, Thomas Munro wrote:
> These two animals seem to have got mixed up about about the size of
> this relation in the same place:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet&dt=2024-02-28%2017%3A34%3A30
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2024-03-01%2006%3A47%3A53
> 
> +++ /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/regress/results/constraints.out
> 2024-03-01 08:22:11.624897033 +0100
> @@ -573,42 +573,38 @@
>   UNIQUE (i) DEFERRABLE INITIALLY DEFERRED;
>  BEGIN;
>  INSERT INTO unique_tbl VALUES (1, 'five');
> +ERROR:  could not read blocks 0..0 in file "base/16384/21437": read
> only 0 of 8192 bytes
> 
> That error message changed slightly in my smgrreadv() commit a couple
> of months ago (it would have been "block 0" and now it's "blocks 0..0"
> because now we can read more than one block at a time) but I don't
> immediately see how anything at that low level could be responsible
> for this.
> 
> 

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Ronan Dunklau
Дата:
Le samedi 2 mars 2024, 23:29:52 CET Tomas Vondra a écrit :
> These are "my" animals (running at a local university). There's a couple
> interesting details:

Hi Tomas,
do you still have the failing cluster data ?

Noah pointed me to this thread, and it looks a bit similar to the FSM
corruption issue I'm facing: https://www.postgresql.org/message-id/
1925490.taCxCBeP46%40aivenlaptop

So if you still have the data, it would be nice to see if you indeed have a
corrupted FSM, and if you have indications when it happened.

Best regards,

--
Ronan Dunklau





Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Tomas Vondra
Дата:

On 3/4/24 14:16, Ronan Dunklau wrote:
> Le samedi 2 mars 2024, 23:29:52 CET Tomas Vondra a écrit :
>> These are "my" animals (running at a local university). There's a couple
>> interesting details:
> 
> Hi Tomas,
> do you still have the failing cluster data ? 
> 
> Noah pointed me to this thread, and it looks a bit similar to the FSM 
> corruption issue I'm facing: https://www.postgresql.org/message-id/
> 1925490.taCxCBeP46%40aivenlaptop
> 
> So if you still have the data, it would be nice to see if you indeed have a 
> corrupted FSM, and if you have indications when it happened.
> 

Sorry, I nuked the buildroot so I don't have the data anymore. Let's see
if it fails again.

regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
Happened again.  I see this is OpenSUSE.  Does that mean the file
system is Btrfs?



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Tomas Vondra
Дата:
On 3/8/24 09:33, Thomas Munro wrote:
> Happened again.  I see this is OpenSUSE.  Does that mean the file
> system is Btrfs?


It is, but I don't think that matters - I've been able to reproduce this
locally on my laptop using ext4 filesystem. I'd bet the important piece
here is -DCLOBBER_CACHE_ALWAYS (and it seems avocet/trilobite are the
only animals running with this).

Also, if this really is a filesystem (or environment) issue, it seems
very strange it'd only affect HEAD and not the other branches. So it
seems quite likely this is actually triggered by a commit.

Looking at the commits from the good/bad runs, I see this:

avocet: good=4c2369a bad=f5a465f
trilobite: good=d13ff82 bad=def0ce3

That means the commit would have to be somewhere here:

f5a465f1a07 Promote assertion about !ReindexIsProcessingIndex to ...
57b28c08305 Doc: fix minor typos in two ECPG function descriptions.
28e858c0f95 Improve documentation and GUC description for ...
a661bf7b0f5 Remove flaky isolation tests for timeouts
874d817baa1 Multiple revisions to the GROUP BY reordering tests
466979ef031 Replace lateral references to removed rels in subqueries
a6b2a51e16d Avoid dangling-pointer problem with partitionwise ...
d360e3cc60e Fix compiler warning on typedef redeclaration
8af25652489 Introduce a new smgr bulk loading facility.
e612384fc78 Fix mistake in SQL features list
d13ff82319c Fix BF failure in commit 93db6cbda0.

My guess would be 8af25652489, as it's the only storage-related commit.

I'm currently running tests to verify this.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Tomas Vondra
Дата:

On 3/8/24 13:21, Tomas Vondra wrote:
> On 3/8/24 09:33, Thomas Munro wrote:
>> Happened again.  I see this is OpenSUSE.  Does that mean the file
>> system is Btrfs?
> 
> 
> It is, but I don't think that matters - I've been able to reproduce this
> locally on my laptop using ext4 filesystem. I'd bet the important piece
> here is -DCLOBBER_CACHE_ALWAYS (and it seems avocet/trilobite are the
> only animals running with this).
> 
> Also, if this really is a filesystem (or environment) issue, it seems
> very strange it'd only affect HEAD and not the other branches. So it
> seems quite likely this is actually triggered by a commit.
> 
> Looking at the commits from the good/bad runs, I see this:
> 
> avocet: good=4c2369a bad=f5a465f
> trilobite: good=d13ff82 bad=def0ce3
> 
> That means the commit would have to be somewhere here:
> 
> f5a465f1a07 Promote assertion about !ReindexIsProcessingIndex to ...
> 57b28c08305 Doc: fix minor typos in two ECPG function descriptions.
> 28e858c0f95 Improve documentation and GUC description for ...
> a661bf7b0f5 Remove flaky isolation tests for timeouts
> 874d817baa1 Multiple revisions to the GROUP BY reordering tests
> 466979ef031 Replace lateral references to removed rels in subqueries
> a6b2a51e16d Avoid dangling-pointer problem with partitionwise ...
> d360e3cc60e Fix compiler warning on typedef redeclaration
> 8af25652489 Introduce a new smgr bulk loading facility.
> e612384fc78 Fix mistake in SQL features list
> d13ff82319c Fix BF failure in commit 93db6cbda0.
> 
> My guess would be 8af25652489, as it's the only storage-related commit.
> 
> I'm currently running tests to verify this.
> 

Yup, the breakage starts with this commit. I haven't looked into the
root cause, or whether the commit maybe just made some pre-existing
issue easier to hit. Also, I haven't followed the discussion on the
pgsql-bugs thread [1], maybe there are some interesting findings.

[1]
https://www.postgresql.org/message-id/flat/1878547.tdWV9SEqCh%40aivenlaptop


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Ronan Dunklau
Дата:
Le vendredi 8 mars 2024, 14:36:48 CET Tomas Vondra a écrit :
> > My guess would be 8af25652489, as it's the only storage-related commit.
> >
> > I'm currently running tests to verify this.
>
> Yup, the breakage starts with this commit. I haven't looked into the
> root cause, or whether the commit maybe just made some pre-existing
> issue easier to hit. Also, I haven't followed the discussion on the
> pgsql-bugs thread [1], maybe there are some interesting findings.
>

If that happens only on HEAD and not on 16, and doesn't involve WAL replay,
then it's not the same bug.

--
Ronan Dunklau





Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
On Sat, Mar 9, 2024 at 2:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 3/8/24 13:21, Tomas Vondra wrote:
> > My guess would be 8af25652489, as it's the only storage-related commit.
> >
> > I'm currently running tests to verify this.
> >
>
> Yup, the breakage starts with this commit. I haven't looked into the
> root cause, or whether the commit maybe just made some pre-existing
> issue easier to hit. Also, I haven't followed the discussion on the
> pgsql-bugs thread [1], maybe there are some interesting findings.

Adding Heikki.



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Tomas Vondra
Дата:

On 3/8/24 21:29, Thomas Munro wrote:
> On Sat, Mar 9, 2024 at 2:36 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> On 3/8/24 13:21, Tomas Vondra wrote:
>>> My guess would be 8af25652489, as it's the only storage-related commit.
>>>
>>> I'm currently running tests to verify this.
>>>
>>
>> Yup, the breakage starts with this commit. I haven't looked into the
>> root cause, or whether the commit maybe just made some pre-existing
>> issue easier to hit. Also, I haven't followed the discussion on the
>> pgsql-bugs thread [1], maybe there are some interesting findings.
> 
> Adding Heikki.

I spent a bit of time investigating this today, but haven't made much
progress due to (a) my unfamiliarity with the smgr code in general and
the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite
time consuming to iterate and experiment.

However, the smallest schedule that still reproduces the issue is:

-------------------
test: test_setup

test: create_aggregate create_function_sql create_cast constraints
triggers select inherit typed_table vacuum drop_if_exists
updatable_views roleattributes create_am hash_func errors infinite_recurse
-------------------

I tried to reduce the second step to just "constraints", but that does
not fail. Clearly there's some concurrency at play, and having just a
single backend makes that go away.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
On Sat, Mar 9, 2024 at 9:48 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I spent a bit of time investigating this today, but haven't made much
> progress due to (a) my unfamiliarity with the smgr code in general and
> the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite
> time consuming to iterate and experiment.
>
> However, the smallest schedule that still reproduces the issue is:
>
> -------------------
> test: test_setup
>
> test: create_aggregate create_function_sql create_cast constraints
> triggers select inherit typed_table vacuum drop_if_exists
> updatable_views roleattributes create_am hash_func errors infinite_recurse
> -------------------

Thanks, reproduced here (painfully slowly).  Looking...

Huh, also look at these extra problems later in the logs of the latest
trilobite and avocet runs, further down after the short read errors:

TRAP: failed Assert("j > attnum"), File: "heaptuple.c", Line: 640, PID: 15753
postgres: autovacuum worker regression(ExceptionalCondition+0x67)[0x9c5f37]
postgres: autovacuum worker regression[0x4b60c8]
postgres: autovacuum worker regression[0x5ff735]
postgres: autovacuum worker regression[0x5fe468]
postgres: autovacuum worker regression(analyze_rel+0x133)[0x5fd5e3]
postgres: autovacuum worker regression(vacuum+0x6b6)[0x683926]
postgres: autovacuum worker regression[0x7ce5e3]
postgres: autovacuum worker regression[0x7cc4f0]
postgres: autovacuum worker regression(StartAutoVacWorker+0x22)[0x7cc152]
postgres: autovacuum worker regression[0x7d57d1]
postgres: autovacuum worker regression[0x7d37bf]
postgres: autovacuum worker regression[0x6f5a4f]

Then crash recovery fails, in one case with:

2024-03-07 20:28:18.632 CET [15860:4] WARNING:  will not overwrite a used ItemId
2024-03-07 20:28:18.632 CET [15860:5] CONTEXT:  WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12
2024-03-07 20:28:18.632 CET [15860:6] PANIC:  failed to add tuple
2024-03-07 20:28:18.632 CET [15860:7] CONTEXT:  WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12

... and in another with:

2024-03-05 11:51:27.992 CET [25510:4] PANIC:  invalid lp
2024-03-05 11:51:27.992 CET [25510:5] CONTEXT:  WAL redo at 0/F87A8D0
for Heap/INPLACE: off: 29; blkref #0: rel 1663/16384/20581, blk 0



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
On Sun, Mar 10, 2024 at 5:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Thanks, reproduced here (painfully slowly).  Looking...

I changed that ERROR to a PANIC and now I can see that
_bt_metaversion() is failing to read a meta page (block 0), and the
file is indeed of size 0 in my filesystem.  Which is not cool, for a
btree.  Looking at btbuildempty(), we have this sequence:

    bulkstate = smgr_bulk_start_rel(index, INIT_FORKNUM);

    /* Construct metapage. */
    metabuf = smgr_bulk_get_buf(bulkstate);
    _bt_initmetapage((Page) metabuf, P_NONE, 0, allequalimage);
    smgr_bulk_write(bulkstate, BTREE_METAPAGE, metabuf, true);

    smgr_bulk_finish(bulkstate);

Ooh.  One idea would be that the smgr lifetime stuff is b0rked,
introducing corruption.  Bulk write itself isn't pinning the smgr
relation, it's relying purely on the object not being invalidated,
which the theory of 21d9c3ee's commit message allowed for but ... here
it's destroyed (HASH_REMOVE'd) sooner under CACHE_CLOBBER_ALWAYS,
which I think we failed to grok.  If that's it, I'm surprised that
things don't implode more spectacularly.  Perhaps HASH_REMOVE should
clobber objects in debug builds, similar to pfree?

For that hypothesis, the corruption might not be happening in the
above-quoted code itself, because it doesn't seem to have an
invalidation acceptance point (unless I'm missing it).  Some other
bulk write got mixed up?  Not sure yet.

I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I won't be surprised if the answer is: if you're holding a reference,
> you have to get a pin (referring to bulk_write.c).

Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine.  It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall().  That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required.  That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine.  It won't be destroyed until after
the end of the transaction.  Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right?



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Heikki Linnakangas
Дата:
Thanks for diagnosing this!

On 10/03/2024 08:23, Thomas Munro wrote:
> On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> I won't be surprised if the answer is: if you're holding a reference,
>> you have to get a pin (referring to bulk_write.c).
> 
> Ahhh, on second thoughts, I take that back, I think the original
> theory still actually works just fine.  It's just that somewhere in
> our refactoring of that commit, when we were vacillating between
> different semantics for 'destroy' and 'release', I think we made a
> mistake: in RelationCacheInvalidate() I think we should now call
> smgrreleaseall(), not smgrdestroyall().

Yes, I ran the reproducer with 'rr', and came to the same conclusion. 
That smgrdestroyall() call closes destroys the SmgrRelation, breaking 
the new assumption that an unpinned SmgrRelation is valid until end of 
transaction.

> That satisfies the
> requirements for sinval queue overflow: we close md.c segments (and
> most importantly virtual file descriptors), so our lack of sinval
> records can't hurt us, we'll reopen all files as required.  That's
> what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
> smgr pointer remains valid, and retains only its "identity", eg hash
> table key, and that's also fine.  It won't be destroyed until after
> the end of the transaction.  Which was the point, and it allows things
> like bulk_write.c (and streaming_read.c) to hold an smgr reference.
> Right

+1.

I wonder if we can now simplify RelationCacheInvalidate() further. In 
the loop above that smgrdestroyall():

>     while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
>     {
>         relation = idhentry->reldesc;
> 
>         /* Must close all smgr references to avoid leaving dangling ptrs */
>         RelationCloseSmgr(relation);
> 
>         /*
>          * Ignore new relations; no other backend will manipulate them before
>          * we commit.  Likewise, before replacing a relation's relfilelocator,
>          * we shall have acquired AccessExclusiveLock and drained any
>          * applicable pending invalidations.
>          */
>         if (relation->rd_createSubid != InvalidSubTransactionId ||
>             relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId)
>             continue;
> 

I don't think we need to call RelationCloseSmgr() for relations created 
in the same transaction anymore.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Heikki Linnakangas
Дата:
On 10/03/2024 11:20, Heikki Linnakangas wrote:
> On 10/03/2024 08:23, Thomas Munro wrote:
>> On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>>> I won't be surprised if the answer is: if you're holding a reference,
>>> you have to get a pin (referring to bulk_write.c).
>>
>> Ahhh, on second thoughts, I take that back, I think the original
>> theory still actually works just fine.  It's just that somewhere in
>> our refactoring of that commit, when we were vacillating between
>> different semantics for 'destroy' and 'release', I think we made a
>> mistake: in RelationCacheInvalidate() I think we should now call
>> smgrreleaseall(), not smgrdestroyall().
> 
> Yes, I ran the reproducer with 'rr', and came to the same conclusion.
> That smgrdestroyall() call closes destroys the SmgrRelation, breaking
> the new assumption that an unpinned SmgrRelation is valid until end of
> transaction.
> 
>> That satisfies the
>> requirements for sinval queue overflow: we close md.c segments (and
>> most importantly virtual file descriptors), so our lack of sinval
>> records can't hurt us, we'll reopen all files as required.  That's
>> what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
>> smgr pointer remains valid, and retains only its "identity", eg hash
>> table key, and that's also fine.  It won't be destroyed until after
>> the end of the transaction.  Which was the point, and it allows things
>> like bulk_write.c (and streaming_read.c) to hold an smgr reference.
>> Right
> 
> +1.
> 
> I wonder if we can now simplify RelationCacheInvalidate() further. In
> the loop above that smgrdestroyall():
> 
>>     while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
>>     {
>>         relation = idhentry->reldesc;
>>
>>         /* Must close all smgr references to avoid leaving dangling ptrs */
>>         RelationCloseSmgr(relation);
>>
>>         /*
>>          * Ignore new relations; no other backend will manipulate them before
>>          * we commit.  Likewise, before replacing a relation's relfilelocator,
>>          * we shall have acquired AccessExclusiveLock and drained any
>>          * applicable pending invalidations.
>>          */
>>         if (relation->rd_createSubid != InvalidSubTransactionId ||
>>             relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId)
>>             continue;
>>
> 
> I don't think we need to call RelationCloseSmgr() for relations created
> in the same transaction anymore.

Barring objections, I'll commit the attached.

Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's 
not required for correctness AFAICS. We don't do it in single-rel 
invalidation in RelationCacheInvalidateEntry() either.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Barring objections, I'll commit the attached.

+1

I guess the comment for smgrreleaseall() could also be updated.  It
mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval
overflow (InvalidateSystemCaches()) should also be mentioned?

> Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
> not required for correctness AFAICS. We don't do it in single-rel
> invalidation in RelationCacheInvalidateEntry() either.

I think we do, because we have missed sinval messages.  It's unlikely
but a relfilenode might have been recycled, and we might have file
descriptors that point to the unlinked files.  That is, there are new
files with the same names and we need to open those ones.



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Thomas Munro
Дата:
On Mon, Mar 11, 2024 at 9:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
> > not required for correctness AFAICS. We don't do it in single-rel
> > invalidation in RelationCacheInvalidateEntry() either.
>
> I think we do, because we have missed sinval messages.  It's unlikely
> but a relfilenode might have been recycled, and we might have file
> descriptors that point to the unlinked files.  That is, there are new
> files with the same names and we need to open those ones.

... though I think you would be right if Dilip and Robert had
succeeded in their quest to introduce 56-bit non-cycling relfilenodes.
And for the record, we can also shoot ourselves in the foot in another
known case without sinval[1], so more work is needed here, but that
doesn't mean this sinval code path should also aim footwards.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGLs554tQFCUjv_vn7ft9Xv5LNjPoAd--3Df%2BJJKJ7A8kw%40mail.gmail.com#f099d68e95edcfe408818447d9da04a7



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

От
Heikki Linnakangas
Дата:
On 10/03/2024 22:59, Thomas Munro wrote:
> On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Barring objections, I'll commit the attached.
> 
> +1

Pushed, thanks!

> I guess the comment for smgrreleaseall() could also be updated.  It
> mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval
> overflow (InvalidateSystemCaches()) should also be mentioned?

I removed that comment; people can grep to find the callers.

>> Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
>> not required for correctness AFAICS. We don't do it in single-rel
>> invalidation in RelationCacheInvalidateEntry() either.
> 
> I think we do, because we have missed sinval messages.  It's unlikely
> but a relfilenode might have been recycled, and we might have file
> descriptors that point to the unlinked files.  That is, there are new
> files with the same names and we need to open those ones.

Gotcha.

-- 
Heikki Linnakangas
Neon (https://neon.tech)