Обсуждение: can't drop table due to reference from orphaned temp function

Поиск
Список
Период
Сортировка

can't drop table due to reference from orphaned temp function

От
Miles Delahunty
Дата:
Hi all,

We're facing a very strange situation at my organisation where a scheduled pg_restore task will consistently fail because postgres thinks there are functions that reference some of the tables to be dropped. Thing is, these functions are temporary functions whose creators have (long) since disconnected.

I have tried to boil down the real situation to a minimal example below. The idea is we create a table and a temporary function that references the backing type by way of its return type. Then we disconnect, and when we reconnect (even much later), and try to drop the table, we get:

ERROR: cannot drop table mytable because other objects depend on it

DETAIL: function pg_temp_3.mytempfunc() depends on type mytableHINT: Use DROP ... CASCADE to drop the dependent objects too.

(Obviously in the toy example I could just follow the CASCADE hint, but in practice the table drop is being done by pg_restore, which offers no such option as far as I know.)

You will note that in my example below, I've appended a thousand commented-out lines to the end of the function definition. If I remove these, the problem doesn't appear. So somehow the length of the function is a factor. Similarly, creating a temp table is also seemingly required to elicit the bug.

Tested with Postgres 14.2 on Ubuntu 20.04 and 13.2 on CentOS 7.

Cheers,
Miles

---

#!/bin/bash

dropdb mydb
createdb mydb
(    cat <<'EOF'
create table mytable ();

create function pg_temp.mytempfunc()
returns mytable language plpgsql as $$
begin    return null;
end;
EOF    for i in {1..1000}       do echo '--____________________________________________________________________________________________________________________'    done    cat <<'EOF'
$$;

create temp table mytemptable();
select pg_temp.mytempfunc();
EOF
) | psql mydb;
sleep 100
psql mydb -c "drop table mytable";

Re: can't drop table due to reference from orphaned temp function

От
Andres Freund
Дата:
Hi,

On 2022-02-18 18:27:37 +1100, Miles Delahunty wrote:
> You will note that in my example below, I've appended a thousand
> commented-out lines to the end of the function definition. If I remove
> these, the problem doesn't appear. So somehow the length of the function is
> a factor. Similarly, creating a temp table is also seemingly required to
> elicit the bug.

Huh. There's a very relevant error message in the log:

2022-02-19 09:39:12.628 PST [3731015][client backend][2/11:0] LOG:  statement: select pg_temp.mytempfunc();
2022-02-19 09:39:12.642 PST [3731015][client backend][2/12:730] FATAL:  cannot fetch toast data without an active
snapshot
2022-02-19 09:39:12.642 PST [3731015][client backend][:0] LOG:  disconnection: session time: 0:00:00.165 user=andres
database=mydbhost=[local]
 
(middle line)

See backtrace below [1]. The same problem does *not* exist when starting to
use the same temp schema in a new session (which drops the old contents
first), which kind of explains why we've not previously noticed this.

But even so, I'm surprised we haven't noticed this before.

It's pretty easy to fix, we just need a
PushActiveSnapshot(GetTransactionSnapshot());
...
PopActiveSnapshot();

around the RemoveTempRelations().


A naive test for this wouldn't be reliable, because the client doesn't wait
for the disconnect to finish. Thus verifying that the temp catalog entries are
gone would race against the removal.

But I think we can make it reliable by using a session level advisory lock?
RemoveTempRelationsCallback() is registered after ShutdownPostgres(), so
session level advisory locks aren't released before the temp relations cleanup
is complete.

Greetings,

Andres Freund

[1]
#0  init_toast_snapshot (toast_snapshot=0x7fffce35c9a0) at
/home/andres/src/postgresql/src/backend/access/common/toast_internals.c:661
#1  0x00007fdbfaef6ce5 in toast_delete_datum (rel=0x7fdbf768f768, value=140582582660640, is_speculative=false)
    at /home/andres/src/postgresql/src/backend/access/common/toast_internals.c:429
#2  0x00007fdbfafcd7b3 in toast_delete_external (rel=0x7fdbf768f768, values=0x7fffce35d170, isnull=0x7fffce35cb30,
is_speculative=false)
    at /home/andres/src/postgresql/src/backend/access/table/toast_helper.c:334
#3  0x00007fdbfaf6c312 in heap_toast_delete (rel=0x7fdbf768f768, oldtup=0x7fffce3603e0, is_speculative=false)
    at /home/andres/src/postgresql/src/backend/access/heap/heaptoast.c:73
#4  0x00007fdbfaf52f3d in heap_delete (relation=0x7fdbf768f768, tid=0x7fdbee3942c4, cid=3, crosscheck=0x0, wait=true,
tmfd=0x7fffce360480,changingPart=false)
 
    at /home/andres/src/postgresql/src/backend/access/heap/heapam.c:3074
#5  0x00007fdbfaf53002 in simple_heap_delete (relation=0x7fdbf768f768, tid=0x7fdbee3942c4)
    at /home/andres/src/postgresql/src/backend/access/heap/heapam.c:3114
#6  0x00007fdbfb034236 in CatalogTupleDelete (heapRel=0x7fdbf768f768, tid=0x7fdbee3942c4) at
/home/andres/src/postgresql/src/backend/catalog/indexing.c:352
#7  0x00007fdbfb123a0b in RemoveFunctionById (funcOid=16401) at
/home/andres/src/postgresql/src/backend/commands/functioncmds.c:1322
#8  0x00007fdbfb022248 in doDeletion (object=0x7fdbfbe3223c, flags=29) at
/home/andres/src/postgresql/src/backend/catalog/dependency.c:1423
#9  0x00007fdbfb021f5e in deleteOneObject (object=0x7fdbfbe3223c, depRel=0x7fffce3606d0, flags=29)
    at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1311
#10 0x00007fdbfb0207e2 in deleteObjectsInList (targetObjects=0x7fdbfbe321e0, depRel=0x7fffce3606d0, flags=29)
    at /home/andres/src/postgresql/src/backend/catalog/dependency.c:271
#11 0x00007fdbfb020897 in performDeletion (object=0x7fffce360704, behavior=DROP_CASCADE, flags=29)
    at /home/andres/src/postgresql/src/backend/catalog/dependency.c:356
#12 0x00007fdbfb03a7b8 in RemoveTempRelations (tempNamespaceId=16399) at
/home/andres/src/postgresql/src/backend/catalog/namespace.c:4277
#13 0x00007fdbfb03a7eb in RemoveTempRelationsCallback (code=0, arg=0) at
/home/andres/src/postgresql/src/backend/catalog/namespace.c:4296
#14 0x00007fdbfb3ee16c in shmem_exit (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:239



Re: can't drop table due to reference from orphaned temp function

От
Andres Freund
Дата:
Hi,

On 2022-02-19 10:00:02 -0800, Andres Freund wrote:
> See backtrace below [1]. The same problem does *not* exist when starting to
> use the same temp schema in a new session (which drops the old contents
> first), which kind of explains why we've not previously noticed this.
>
> But even so, I'm surprised we haven't noticed this before.

Ah, there's a reason for that. In many cases we'll have a catalog snapshot
registered, which is enough for init_toast_snapshot(). But in Miles' example,
the object dropped just prior ends with catalog invalidations.

Proposed bugfix and test attached.

I think it's ok to backpatch the test. There might be a slight change in
output due to 618c16707a6d6e8f5c83ede2092975e4670201ad not being backpatched,
but that's OK I think.


I think it is dangerous that we return a cached catalog snapshot for things
like GetOldestSnapshot() unless they're also registered or active - we can't
rely on catalog snapshots to be present. Indeed, if we didn't, this bug would
have been found before, as some added assertions confirm.

I don't think we can just ignore the catalog snapshot though, it can be
registered in the future, so it actually is the oldest snapshot. But at least
we should assert that there's some snapshot registered/active. In the attached
patch I've added HaveRegisteredOrActiveSnapshot() and used that in
init_toast_snapshot().

Any better ideas?

Greetings,

Andres Freund

Вложения

Re: can't drop table due to reference from orphaned temp function

От
Andres Freund
Дата:
Hi,

On 2022-02-19 13:31:39 -0800, Andres Freund wrote:
> Proposed bugfix and test attached.

Pushed test + fix to all branches, assertion to HEAD.

Miles, thanks for the report!

Greetings,

Andres Freund



Re: can't drop table due to reference from orphaned temp function

От
Yura Sokolov
Дата:
Good day, Andres.

В Сб, 19/02/2022 в 13:31 -0800, Andres Freund пишет:
> Hi,
> 
> On 2022-02-19 10:00:02 -0800, Andres Freund wrote:
> > See backtrace below [1]. The same problem does *not* exist when starting to
> > use the same temp schema in a new session (which drops the old contents
> > first), which kind of explains why we've not previously noticed this.
> > 
> > But even so, I'm surprised we haven't noticed this before.
> 
> Ah, there's a reason for that. In many cases we'll have a catalog snapshot
> registered, which is enough for init_toast_snapshot(). But in Miles' example,
> the object dropped just prior ends with catalog invalidations.
> 
> Proposed bugfix and test attached.
> 
> I think it's ok to backpatch the test. There might be a slight change in
> output due to 618c16707a6d6e8f5c83ede2092975e4670201ad not being backpatched,
> but that's OK I think.
> 
> 
> I think it is dangerous that we return a cached catalog snapshot for things
> like GetOldestSnapshot() unless they're also registered or active - we can't
> rely on catalog snapshots to be present. Indeed, if we didn't, this bug would
> have been found before, as some added assertions confirm.
> 
> I don't think we can just ignore the catalog snapshot though, it can be
> registered in the future, so it actually is the oldest snapshot. But at least
> we should assert that there's some snapshot registered/active. In the attached
> patch I've added HaveRegisteredOrActiveSnapshot() and used that in
> init_toast_snapshot().

Reading your message and HaveRegisteredOrActiveSnapshot's body, I can't get its logic:
- if it is dangerous to have CatalogSnapshot alone in RegisteredSnapshots,
  then why we return 'false' if RegisteredSnapshots is NOT singular?

I believe, body of HaveRegisteredOrActiveSnapshot should look like:

    if (ActiveSnapshot != NULL)
        return true;

    if (pairingheap_is_empty(&RegisteredSnapshots))
        return false;

    /*
     * The catalog snapshot is in RegisteredSnapshots when valid, but can be
     * removed at any time due to invalidation processing. If explicitly
     * registered more than one snapshot has to be in RegisteredSnapshots.
     */    
    if (pairingheap_is_singular(&RegisteredSnapshots))
        return CatalogSnapshot == NULL;
    
    return true;

Am I wrong?

regards,

Yura




Re: can't drop table due to reference from orphaned temp function

От
Andres Freund
Дата:
Hi,

On 2022-07-12 17:13:28 +0300, Yura Sokolov wrote:
> В Сб, 19/02/2022 в 13:31 -0800, Andres Freund пишет:
> > Hi,
> > 
> > On 2022-02-19 10:00:02 -0800, Andres Freund wrote:
> > > See backtrace below [1]. The same problem does *not* exist when starting to
> > > use the same temp schema in a new session (which drops the old contents
> > > first), which kind of explains why we've not previously noticed this.
> > > 
> > > But even so, I'm surprised we haven't noticed this before.
> > 
> > Ah, there's a reason for that. In many cases we'll have a catalog snapshot
> > registered, which is enough for init_toast_snapshot(). But in Miles' example,
> > the object dropped just prior ends with catalog invalidations.
> > 
> > Proposed bugfix and test attached.
> > 
> > I think it's ok to backpatch the test. There might be a slight change in
> > output due to 618c16707a6d6e8f5c83ede2092975e4670201ad not being backpatched,
> > but that's OK I think.
> > 
> > 
> > I think it is dangerous that we return a cached catalog snapshot for things
> > like GetOldestSnapshot() unless they're also registered or active - we can't
> > rely on catalog snapshots to be present. Indeed, if we didn't, this bug would
> > have been found before, as some added assertions confirm.
> > 
> > I don't think we can just ignore the catalog snapshot though, it can be
> > registered in the future, so it actually is the oldest snapshot. But at least
> > we should assert that there's some snapshot registered/active. In the attached
> > patch I've added HaveRegisteredOrActiveSnapshot() and used that in
> > init_toast_snapshot().
> 
> Reading your message and HaveRegisteredOrActiveSnapshot's body, I can't get its logic:
> - if it is dangerous to have CatalogSnapshot alone in RegisteredSnapshots,
>   then why we return 'false' if RegisteredSnapshots is NOT singular?

IIRC that was a bug that since was fixed. Did you check the current
definition?

Greetings,

Andres Freund



Re: can't drop table due to reference from orphaned temp function

От
Yura Sokolov
Дата:
В Вт, 12/07/2022 в 12:24 -0700, Andres Freund пишет:
> Hi,
> 
> On 2022-07-12 17:13:28 +0300, Yura Sokolov wrote:
> > В Сб, 19/02/2022 в 13:31 -0800, Andres Freund пишет:
> > > Hi,
> > > 
> > > On 2022-02-19 10:00:02 -0800, Andres Freund wrote:
> > > > See backtrace below [1]. The same problem does *not* exist when starting to
> > > > use the same temp schema in a new session (which drops the old contents
> > > > first), which kind of explains why we've not previously noticed this.
> > > > 
> > > > But even so, I'm surprised we haven't noticed this before.
> > > 
> > > Ah, there's a reason for that. In many cases we'll have a catalog snapshot
> > > registered, which is enough for init_toast_snapshot(). But in Miles' example,
> > > the object dropped just prior ends with catalog invalidations.
> > > 
> > > Proposed bugfix and test attached.
> > > 
> > > I think it's ok to backpatch the test. There might be a slight change in
> > > output due to 618c16707a6d6e8f5c83ede2092975e4670201ad not being backpatched,
> > > but that's OK I think.
> > > 
> > > 
> > > I think it is dangerous that we return a cached catalog snapshot for things
> > > like GetOldestSnapshot() unless they're also registered or active - we can't
> > > rely on catalog snapshots to be present. Indeed, if we didn't, this bug would
> > > have been found before, as some added assertions confirm.
> > > 
> > > I don't think we can just ignore the catalog snapshot though, it can be
> > > registered in the future, so it actually is the oldest snapshot. But at least
> > > we should assert that there's some snapshot registered/active. In the attached
> > > patch I've added HaveRegisteredOrActiveSnapshot() and used that in
> > > init_toast_snapshot().
> > 
> > Reading your message and HaveRegisteredOrActiveSnapshot's body, I can't get its logic:
> > - if it is dangerous to have CatalogSnapshot alone in RegisteredSnapshots,
> >   then why we return 'false' if RegisteredSnapshots is NOT singular?
> 
> IIRC that was a bug that since was fixed. Did you check the current
> definition?

Ah, sorry. I really looked at old definition. New one is correct.