Обсуждение: Pg 15 devel crashes when fetching data from table using cursor

Поиск
Список
Период
Сортировка

Pg 15 devel crashes when fetching data from table using cursor

От
hubert depesz lubaczewski
Дата:
Hi,
I ran 15devel, compiled today, on Ryzen processor (amd64), with data for
explain.depesz.com.

In psql I ran:

#v+
=$ psql -U depesz_explain
psql (15devel)
Type "help" for help.

(depesz_explain@[local]:5430) 09:46:37 [depesz_explain] 
$ begin;
BEGIN
Time: 0.312 ms
(depesz_explain@[local]:5430) 09:46:40 [depesz_explain] 
*$ DECLARE csr CURSOR FOR select id, plan FROM plans_part_25 WHERE is_public and NOT is_deleted and NOT is_anonymized;
DECLARE CURSOR
Time: 1.486 ms
(depesz_explain@[local]:5430) 09:46:53 [depesz_explain] 
*$ fetch 100 from csr;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 92.067 ms
(@:) 09:47:03 [] 
?$ 
#v-

Pg logs show:

#v+
2022-03-07 09:46:37.179 CET depesz_explain@depesz_explain 526595 [local] LOG:  connection authorized:
user=depesz_explaindatabase=depesz_explain application_name=psql
 
2022-03-07 09:46:40.082 CET depesz_explain@depesz_explain 526595 [local] LOG:  duration: 0.235 ms  statement: begin;
2022-03-07 09:46:53.933 CET depesz_explain@depesz_explain 526595 [local] LOG:  duration: 1.348 ms  statement: DECLARE
csrCURSOR FOR select id, plan FROM plans_part_25 WHERE is_public and NOT is_deleted and NOT is_anonymized;
 
TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 526595)
postgres: depesz_explain depesz_explain [local] FETCH(ExceptionalCondition+0x99)[0x557da5491dc9]
postgres: depesz_explain depesz_explain [local] FETCH(+0x174f64)[0x557da5010f64]
postgres: depesz_explain depesz_explain [local] FETCH(heap_fetch_toast_slice+0x15f)[0x557da50606cf]
postgres: depesz_explain depesz_explain [local] FETCH(+0x16c374)[0x557da5008374]
postgres: depesz_explain depesz_explain [local] FETCH(detoast_attr+0x18d)[0x557da500884d]
postgres: depesz_explain depesz_explain [local] FETCH(text_to_cstring+0x16)[0x557da5467796]
postgres: depesz_explain depesz_explain [local] FETCH(OutputFunctionCall+0x58)[0x557da549c7c8]
postgres: depesz_explain depesz_explain [local] FETCH(+0x170fc5)[0x557da500cfc5]
postgres: depesz_explain depesz_explain [local] FETCH(+0x4cc4ee)[0x557da53684ee]
postgres: depesz_explain depesz_explain [local] FETCH(+0x4cc626)[0x557da5368626]
postgres: depesz_explain depesz_explain [local] FETCH(PortalRun+0x191)[0x557da5369e71]
postgres: depesz_explain depesz_explain [local] FETCH(+0x4c9ceb)[0x557da5365ceb]
postgres: depesz_explain depesz_explain [local] FETCH(PostgresMain+0x1708)[0x557da5367828]
postgres: depesz_explain depesz_explain [local] FETCH(+0x432073)[0x557da52ce073]
postgres: depesz_explain depesz_explain [local] FETCH(PostmasterMain+0xd5d)[0x557da52cf0ad]
postgres: depesz_explain depesz_explain [local] FETCH(main+0x20f)[0x557da4ff99bf]
/lib/x86_64-linux-gnu/libc.so.6(+0x29fd0)[0x7f7ba79dcfd0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x7d)[0x7f7ba79dd07d]
postgres: depesz_explain depesz_explain [local] FETCH(_start+0x25)[0x557da4ff9d15]
2022-03-07 09:47:03.298 CET @ 508510  LOG:  server process (PID 526595) was terminated by signal 6: Aborted
2022-03-07 09:47:03.298 CET @ 508510  DETAIL:  Failed process was running: fetch 100 from csr;
2022-03-07 09:47:03.298 CET @ 508510  LOG:  terminating any other active server processes
2022-03-07 09:47:03.299 CET [unknown]@[unknown] 527019 [local] LOG:  connection received: host=[local]
2022-03-07 09:47:03.299 CET depesz_explain@depesz_explain 527019 [local] FATAL:  the database system is in recovery
mode
2022-03-07 09:47:03.299 CET @ 508510  LOG:  all server processes terminated; reinitializing
2022-03-07 09:47:03.316 CET @ 527022  LOG:  database system was interrupted; last known up at 2022-03-07 09:45:57 CET
2022-03-07 09:47:03.558 CET @ 527022  LOG:  database system was not properly shut down; automatic recovery in progress
2022-03-07 09:47:03.566 CET @ 527022  LOG:  redo starts at 0/F61B64E0
2022-03-07 09:47:03.566 CET @ 527022  LOG:  invalid record length at 0/F61B6518: wanted 24, got 0
2022-03-07 09:47:03.566 CET @ 527022  LOG:  redo done at 0/F61B64E0 system usage: CPU: user: 0.00 s, system: 0.00 s,
elapsed:0.00 s
 
2022-03-07 09:47:03.578 CET @ 527023  LOG:  checkpoint starting: end-of-recovery immediate wait
2022-03-07 09:47:03.585 CET @ 527023  LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0
removed,0 recycled; write=0.002 s, sync=0.001 s, total=0.008 s; sync files=2, longest=0.001 s, average=0.001 s;
distance=0kB, estimate=0 kB
 
2022-03-07 09:47:03.587 CET @ 508510  LOG:  database system is ready to accept connections
#v-

Interestingly, I can select the data if I don't use cursor:

#v+
$ select id, plan FROM plans_part_25 WHERE is_public and NOT is_deleted and NOT is_anonymized \g | wc
1162074 11448274 1337357239722
Time: 213.767 ms
#v-

Details about the table:

#v+
$ \d+ plans_part_25
                                                    Table "public.plans_part_25"
      Column      │           Type           │ Collation │ Nullable │ Default │ Storage  │ Compression │ Stats target │
Description
 

──────────────────┼──────────────────────────┼───────────┼──────────┼─────────┼──────────┼─────────────┼──────────────┼─────────────
 id               │ text                     │           │ not null │         │ extended │             │              │

 plan             │ text                     │           │ not null │         │ extended │             │              │

 entered_on       │ timestamp with time zone │           │ not null │ now()   │ plain    │             │              │

 is_public        │ boolean                  │           │ not null │ true    │ plain    │             │              │

 is_anonymized    │ boolean                  │           │ not null │ false   │ plain    │             │              │

 title            │ text                     │           │          │         │ extended │             │              │

 delete_key       │ text                     │           │          │         │ extended │             │              │

 is_deleted       │ boolean                  │           │ not null │ false   │ plain    │             │              │

 added_by         │ text                     │           │          │         │ extended │             │              │

 optimization_for │ text                     │           │          │         │ extended │             │              │

 query            │ text                     │           │          │         │ extended │             │              │

Partition of: plans FOR VALUES WITH (modulus 50, remainder 25)
Partition constraint: satisfies_hash_partition('16523'::oid, 50, 25, id)
Indexes:
    "part_25_pkey" PRIMARY KEY, btree (id)
    "part_25_added_by_entered_on_id_idx" UNIQUE, btree (added_by, entered_on, id) WHERE is_deleted = false
    "part_25_added_by_idx" btree (added_by)
    "part_25_entered_on_idx" btree (entered_on)
    "part_25_entered_on_idx1" btree (entered_on) WHERE is_public AND NOT is_deleted
    "part_25_length_idx" btree (length(plan)) WHERE NOT is_anonymized AND is_public
    "part_25_optimization_for_idx" btree (optimization_for) WHERE NOT is_deleted
Access method: heap

$ select pg_column_compression(plan), count(*) FROM plans_part_25 WHERE is_public and NOT is_deleted and NOT
is_anonymizedgroup by 1;
 
 pg_column_compression │ count 
───────────────────────┼───────
 [null]                │  3272
 pglz                  │  6556
(2 rows)
#v-

Information about compilation:

#v+
=$ pg_config --configure
 '--prefix=/home/pgdba/work' '--with-pgport=5430' '--with-tcl' '--with-perl' '--with-python'
'--enable-integer-datetimes''--without-pam' '--without-bonjour' '--with-openssl' '--with-uuid=ossp' '--with-readline'
'--with-libxml''--with-systemd' '--with-zlib' '--with-lz4' '--with-llvm' '--with-gnu-ld' '--enable-debug'
'--enable-depend''--enable-cassert' 'PYTHON=/usr/bin/python3'
 

=$ dpkg -l | grep -iE '^ii[[:space:]]*(lib)?(gcc|clang|llvm|lz4|zlib)'
ii  clang                                         1:13.0-53~exp1                              amd64        C, C++ and
Objective-Ccompiler (LLVM based), clang binary
 
ii  clang-13                                      1:13.0.0-2                                  amd64        C, C++ and
Objective-Ccompiler
 
ii  gcc                                           4:11.2.0-1ubuntu1                           amd64        GNU C
compiler
ii  gcc-11                                        11.2.0-7ubuntu2                             amd64        GNU C
compiler
ii  gcc-11-aarch64-linux-gnu                      11.2.0-5ubuntu1cross1                       amd64        GNU C
compiler(cross compiler for arm64 architecture)
 
ii  gcc-11-aarch64-linux-gnu-base:amd64           11.2.0-5ubuntu1cross1                       amd64        GCC, the GNU
CompilerCollection (base package)
 
ii  gcc-11-base:amd64                             11.2.0-7ubuntu2                             amd64        GCC, the GNU
CompilerCollection (base package)
 
ii  gcc-11-cross-base                             11.2.0-5ubuntu1cross1                       all          GCC, the GNU
CompilerCollection (library base package)
 
ii  gcc-aarch64-linux-gnu                         4:11.2.0-1ubuntu1                           amd64        GNU C
compilerfor the arm64 architecture
 
ii  libclang-common-13-dev                        1:13.0.0-2                                  amd64        Clang
library- Common development package
 
ii  libclang-cpp13                                1:13.0.0-2                                  amd64        C++
interfaceto the Clang library
 
ii  libclang1-13                                  1:13.0.0-2                                  amd64        C interface
tothe Clang library
 
ii  libgcc-11-dev:amd64                           11.2.0-7ubuntu2                             amd64        GCC support
library(development files)
 
ii  libgcc-11-dev-arm64-cross                     11.2.0-5ubuntu1cross1                       all          GCC support
library(development files)
 
ii  libgcc-s1:amd64                               11.2.0-7ubuntu2                             amd64        GCC support
library
ii  libgcc-s1-arm64-cross                         11.2.0-5ubuntu1cross1                       all          GCC support
library(arm64)
 
ii  libllvm12:amd64                               1:12.0.1-8build1                            amd64        Modular
compilerand toolchain technologies, runtime library
 
ii  libllvm13:amd64                               1:13.0.0-2                                  amd64        Modular
compilerand toolchain technologies, runtime library
 
ii  liblz4-1:amd64                                1.9.3-2                                     amd64        Fast LZ
compressionalgorithm library - runtime
 
ii  liblz4-dev:amd64                              1.9.3-2                                     amd64        Fast LZ
compressionalgorithm library - development files
 
ii  llvm                                          1:13.0-53~exp1                              amd64        Low-Level
VirtualMachine (LLVM)
 
ii  llvm-13                                       1:13.0.0-2                                  amd64        Modular
compilerand toolchain technologies
 
ii  llvm-13-dev                                   1:13.0.0-2                                  amd64        Modular
compilerand toolchain technologies, libraries and headers
 
ii  llvm-13-linker-tools                          1:13.0.0-2                                  amd64        Modular
compilerand toolchain technologies - Plugins
 
ii  llvm-13-runtime                               1:13.0.0-2                                  amd64        Modular
compilerand toolchain technologies, IR interpreter
 
ii  llvm-13-tools                                 1:13.0.0-2                                  amd64        Modular
compilerand toolchain technologies, tools
 
ii  llvm-runtime                                  1:13.0-53~exp1                              amd64        Low-Level
VirtualMachine (LLVM), bytecode interpreter
 
ii  zlib1g:amd64                                  1:1.2.11.dfsg-2ubuntu7                      amd64        compression
library- runtime
 
ii  zlib1g-dev:amd64                              1:1.2.11.dfsg-2ubuntu7                      amd64        compression
library- development
 
#v-

depesz



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Andres Freund
Дата:
Hi,

On 2022-03-07 10:44:23 +0100, hubert depesz lubaczewski wrote:
> I ran 15devel, compiled today, on Ryzen processor (amd64), with data for
> explain.depesz.com.

> TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 526595)
> postgres: depesz_explain depesz_explain [local] FETCH(ExceptionalCondition+0x99)[0x557da5491dc9]
> postgres: depesz_explain depesz_explain [local] FETCH(+0x174f64)[0x557da5010f64]
> postgres: depesz_explain depesz_explain [local] FETCH(heap_fetch_toast_slice+0x15f)[0x557da50606cf]

That's a new assertion, but I put it in to detect bugs like what's fixed in
7c38ef2a5d6cf6d8dc3834399d7a1c364d64ce64. Looks like something in the snapshot
mangement of cursors is busted...

Any chance you could try to come up with a reproducer?

Greetings,

Andres Freund



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Peter Geoghegan
Дата:
On Thu, Mar 10, 2022 at 4:50 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-03-07 10:44:23 +0100, hubert depesz lubaczewski wrote:
> > TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 526595)

> That's a new assertion, but I put it in to detect bugs like what's fixed in
> 7c38ef2a5d6cf6d8dc3834399d7a1c364d64ce64. Looks like something in the snapshot
> mangement of cursors is busted...
>
> Any chance you could try to come up with a reproducer?

Although it isn't related to this bug, I thought I'd still ask: where
else we could add "Assert(HaveRegisteredOrActiveSnapshot())" like
this, to get similar protection? Isn't this exactly the kind of
assertion that is contemplated by comments in GlobalVisTestFor() that
you wrote (for the snapshot scalability work)?

Right now GlobalVisTestFor() only uses a weaker assertion:

GlobalVisState *
GlobalVisTestFor(Relation rel)
{
    GlobalVisState *state = NULL;

    /* XXX: we should assert that a snapshot is pushed or registered */
    Assert(RecentXmin);
    ....
}

I tried adding a similar assertion here myself just now, but that
breaks heap_page_prune_opt() (opportunistic pruning) that takes place
in the autovacuum launcher. I doubt that that means very much, though.

-- 
Peter Geoghegan



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Andres Freund
Дата:
Hi,

On 2022-03-10 17:38:10 -0800, Peter Geoghegan wrote:
> On Thu, Mar 10, 2022 at 4:50 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-03-07 10:44:23 +0100, hubert depesz lubaczewski wrote:
> > > TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 526595)
> 
> > That's a new assertion, but I put it in to detect bugs like what's fixed in
> > 7c38ef2a5d6cf6d8dc3834399d7a1c364d64ce64. Looks like something in the snapshot
> > mangement of cursors is busted...
> >
> > Any chance you could try to come up with a reproducer?
> 
> Although it isn't related to this bug, I thought I'd still ask: where
> else we could add "Assert(HaveRegisteredOrActiveSnapshot())" like
> this, to get similar protection?

Probably in a bunch of places, yes.


> Isn't this exactly the kind of
> assertion that is contemplated by comments in GlobalVisTestFor() that
> you wrote (for the snapshot scalability work)?

IIRC we can't easily there [yet], because we're incredibly sloppy around
snapshots for catalog accesses in a number of places. I started a thread and
it didn't seem anybody was too concerned. I probably should pick it up again.


Another example of complete brokenness in the wider area is catalog accesses
happening after PROC_IN_VACUUM is set. Afaict it's entirely incorrect to do
any catalog accesses once set, because rows that are still needed for the
catalog access can just be removed.

Greetings,

Andres Freund



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Another example of complete brokenness in the wider area is catalog accesses
> happening after PROC_IN_VACUUM is set. Afaict it's entirely incorrect to do
> any catalog accesses once set, because rows that are still needed for the
> catalog access can just be removed.

That seems like a fairly untenable design rule.  Maybe we should rethink
what PROC_IN_VACUUM means/is used for.

            regards, tom lane



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Andres Freund
Дата:
Hi,

On 2022-03-10 21:05:37 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Another example of complete brokenness in the wider area is catalog accesses
> > happening after PROC_IN_VACUUM is set. Afaict it's entirely incorrect to do
> > any catalog accesses once set, because rows that are still needed for the
> > catalog access can just be removed.
> 
> That seems like a fairly untenable design rule.  Maybe we should rethink
> what PROC_IN_VACUUM means/is used for.

Isn't it something fairly fundamental? The goal of PROC_IN_VACUUM is to
prevent vacuum from holding up the "horizon" so other vacuums can remove rows
that have become dead after a long vacuum started.

I think we basically have to move the setting of PROC_IN_VACUUM to the last
moment, when we've done everything else, and then force a new snapshot to be
computed after.

Greetings,

Andres Freund



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I think we basically have to move the setting of PROC_IN_VACUUM to the last
> moment, when we've done everything else, and then force a new snapshot to be
> computed after.

Maybe we can add Assert(!PROC_IN_VACUUM) to catalog accesses?

But the main reason I'm concerned about this is that I fear people
will unthinkingly put catalog accesses into error paths, where
they might well not get exercised by any normal testing.
The poster child for this is something like

    if (unlikely-condition)
        elog(ERROR, "Houston, we have a problem with table %s",
             get_rel_name(relid));

Assertions are unlikely to help us catch that.

            regards, tom lane



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Andres Freund
Дата:
On 2022-03-10 16:50:45 -0800, Andres Freund wrote:
> Any chance you could try to come up with a reproducer?

Found one myself:

BEGIN;
DECLARE foo CURSOR FOR SELECT ev_action FROM pg_rewrite;
FETCH FROM foo;

interestingly a single row isn't sufficient to trigger it, even if it contains
data to be detoasted...

- Andres



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> interestingly a single row isn't sufficient to trigger it, even if it contains
> data to be detoasted...

Hah, that explains why I failed to reproduce it a couple days ago.
IIRC, I set up a test case with a one-row table :-(

But ... how would we get through one row and then fail on the next?

            regards, tom lane



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Andres Freund
Дата:
Hi,

On 2022-03-10 21:40:05 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > interestingly a single row isn't sufficient to trigger it, even if it contains
> > data to be detoasted...

Argh, no, that was me confusing a compressed with an externally toasted row...

BEGIN;
DECLARE foo CURSOR FOR SELECT ev_action FROM pg_rewrite WHERE ev_class = 'information_schema.columns'::regclass;
FETCH FROM foo;

triggers the problem just as well (backtrace below).


> Hah, that explains why I failed to reproduce it a couple days ago.
> IIRC, I set up a test case with a one-row table :-(

Maybe that was something similar?


This suggests that we should assert that we have a suitable snapshot in
pg_detoast_datum*, so that we don't need to have an externally toasted datum
to notice such bugs. The reason I put it in init_toast_snapshot() was that the
bug leading to the assert was from toast_delete_datum(). But even there it'd
be better to put the assert to before the early return?



#0  ExceptionalCondition (conditionName=0x7f051dbfbca8 "HaveRegisteredOrActiveSnapshot()", errorType=0x7f051dbfb8e8
"FailedAssertion",
    fileName=0x7f051dbfb8a0 "/home/andres/src/postgresql/src/backend/access/common/toast_internals.c", lineNumber=670)
    at /home/andres/src/postgresql/src/backend/utils/error/assert.c:36
#1  0x00007f051d49a64d in init_toast_snapshot (toast_snapshot=0x7ffcfa83fe70) at
/home/andres/src/postgresql/src/backend/access/common/toast_internals.c:670
#2  0x00007f051d50e7d7 in heap_fetch_toast_slice (toastrel=0x7f0494d940f8, valueid=12004, attrsize=2113, sliceoffset=0,
slicelength=2113,
    result=0x7f051feb8670) at /home/andres/src/postgresql/src/backend/access/heap/heaptoast.c:688
#3  0x00007f051d48ce9e in table_relation_fetch_toast_slice (toastrel=0x7f0494d940f8, valueid=12004, attrsize=2113,
sliceoffset=0,slicelength=2113,
 
    result=0x7f051feb8670) at /home/andres/src/postgresql/src/include/access/tableam.h:1892
#4  0x00007f051d48ddad in toast_fetch_datum (attr=0x7f051ff57b40) at
/home/andres/src/postgresql/src/backend/access/common/detoast.c:375
#5  0x00007f051d48d26d in detoast_attr (attr=0x7f051ff57b40) at
/home/andres/src/postgresql/src/backend/access/common/detoast.c:123
#6  0x00007f051db83d86 in pg_detoast_datum_packed (datum=0x7f051ff57b40) at
/home/andres/src/postgresql/src/backend/utils/fmgr/fmgr.c:1757
#7  0x00007f051db35489 in text_to_cstring (t=0x7f051ff57b40) at
/home/andres/src/postgresql/src/backend/utils/adt/varlena.c:225
#8  0x00007f051db36487 in textout (fcinfo=0x7ffcfa8401f0) at
/home/andres/src/postgresql/src/backend/utils/adt/varlena.c:574
#9  0x00007f051dac0abc in pg_node_tree_out (fcinfo=0x7ffcfa8401f0) at
/home/andres/src/postgresql/src/backend/utils/adt/pseudotypes.c:354
#10 0x00007f051db828c8 in FunctionCall1Coll (flinfo=0x7f051feb2018, collation=0, arg1=139659987745600)
    at /home/andres/src/postgresql/src/backend/utils/fmgr/fmgr.c:1138
#11 0x00007f051db839d9 in OutputFunctionCall (flinfo=0x7f051feb2018, val=139659987745600) at
/home/andres/src/postgresql/src/backend/utils/fmgr/fmgr.c:1575
#12 0x00007f051d494557 in printtup (slot=0x7f051feacfa8, self=0x7f051fe8b388) at
/home/andres/src/postgresql/src/backend/access/common/printtup.c:357
#13 0x00007f051d9cf51f in RunFromStore (portal=0x7f051fef3ff0, direction=ForwardScanDirection, count=0,
dest=0x7f051fe8b388)
    at /home/andres/src/postgresql/src/backend/tcop/pquery.c:1096
#14 0x00007f051d9cf03b in PortalRunSelect (portal=0x7f051fef3ff0, forward=true, count=0, dest=0x7f051fe8b388)
    at /home/andres/src/postgresql/src/backend/tcop/pquery.c:917
#15 0x00007f051d9ced02 in PortalRun (portal=0x7f051fef3ff0, count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x7f051fe8b388,
    altdest=0x7f051fe8b388, qc=0x7ffcfa840510) at /home/andres/src/postgresql/src/backend/tcop/pquery.c:765
#16 0x00007f051d9c8245 in exec_simple_query (query_string=0x7f051fe8a530 "FETCH FROM foo ;") at
/home/andres/src/postgresql/src/backend/tcop/postgres.c:1250


Which indeed looks like a problem - there's no snapshot handling around
RunFromStore() (compare with ExecutorRun() invocation just after), but
FillPortalStore passes detoast=false to SetTuplestoreDestReceiverParams().

Greetings,

Andres Freund



Re: Pg 15 devel crashes when fetching data from table using cursor

От
Andres Freund
Дата:
Hi,

On 2022-03-10 19:07:21 -0800, Andres Freund wrote:
> This suggests that we should assert that we have a suitable snapshot in
> pg_detoast_datum*, so that we don't need to have an externally toasted datum
> to notice such bugs. The reason I put it in init_toast_snapshot() was that the
> bug leading to the assert was from toast_delete_datum(). But even there it'd
> be better to put the assert to before the early return?

I added an assert like that, but it'd be a lot of stuff to clean up to make it
work :(. There's some easy to ignore cases like bootstrap (safe),
PostgresInit() (probably not). But after that there's just countless unsafe
accesses. Including the one Peter just mentioned in do_autovacuum(). And of
course the one triggering this report.

In a lot of places we fetch a tuple (using a snapshot of course) and then
later after the snapshot is released, take actions that cause datums to be
detoasted (if they were toasted).  Afaict that's plainly not safe - nothing
prevents the toasted datum to be pruned away and replaced with something else,
potentially of a different datatype.

It kinda works for relations using catalog snapshots, because we do hold the
catalog snapshot for longer - but only until the next invalidations are
processed.


I wonder what a debugging mode forcing everything toastable to be toasted
would bring to light *shudder*.

Greetings,

Andres Freund