Обсуждение: Avoid orphaned objects dependencies, take 3

Поиск
Список
Период
Сортировка

Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

This new thread is a follow-up of [1] and [2].

Problem description:

We have occasionally observed objects having an orphaned dependency, the 
most common case we have seen is functions not linked to any namespaces.

Examples to produce such orphaned dependencies:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

A patch has been initially proposed to fix this particular 
(function-to-namespace) dependency (see [1]), but there could be much 
more scenarios (like the function-to-datatype one highlighted by Gilles 
in [1] that could lead to a function having an invalid parameter datatype).

As Tom said there are dozens more cases that would need to be 
considered, and a global approach to avoid those race conditions should 
be considered instead.

A first global approach attempt has been proposed in [2] making use of a dirty
snapshot when recording the dependency. But this approach appeared to be "scary"
and it was still failing to close some race conditions (see [2] for details).

Then, Tom proposed another approach in [2] which is that "creation DDL will have
to take a lock on each referenced object that'd conflict with a lock taken by
DROP".

This is what the attached patch is trying to achieve.

It does the following:

1) A new lock (that conflicts with a lock taken by DROP) has been put in place
when the dependencies are being recorded.

Thanks to it, the drop schema in scenario 2 would be locked (resulting in an
error should session 1 committs).

2) After locking the object while recording the dependency, the patch checks
that the object still exists.

Thanks to it, session 2 in scenario 1 would be locked and would report an error
once session 1 committs (that would not be the case should session 1 abort the
transaction).

The patch also adds a few tests for some dependency cases (that would currently
produce orphaned objects):

- schema and function (as the above scenarios)
- function and type
- table and type (which is I think problematic enough, as involving a table into
the game, to fix this stuff as a whole).

[1]:
https://www.postgresql.org/message-id/flat/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net#9af5cdaa9e80879beb1def3604c976e8
[2]: https://www.postgresql.org/message-id/flat/8369ff70-0e31-f194-2954-787f4d9e21dd%40amazon.com

Please note that I'm not used to with this area of the code so that the patch
might not take the option proposed by Tom the "right" way.

Adding the patch to the July CF.

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Alexander Lakhin
Дата:
Hi Bertrand,

22.04.2024 11:45, Bertrand Drouvot wrote:
> Hi,
>
> This new thread is a follow-up of [1] and [2].
>
> Problem description:
>
> We have occasionally observed objects having an orphaned dependency, the
> most common case we have seen is functions not linked to any namespaces.
>
> ...
>
> Looking forward to your feedback,

This have reminded me of bug #17182 [1].
Unfortunately, with the patch applied, the following script:

for ((i=1;i<=100;i++)); do
   ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 &
   echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
   "  | psql >psql2.log 2>&1 &
   wait
   psql -c "DROP SCHEMA s CASCADE" >psql3.log
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
   LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql

still ends with:
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.

2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG:  server process (PID 1388378) was terminated by signal 11: 
Segmentation fault
2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL:  Failed process was running: SELECT 
pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
       LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL

[1] https://www.postgresql.org/message-id/flat/17182-a6baa001dd1784be%40postgresql.org

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Mon, Apr 22, 2024 at 01:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> 22.04.2024 11:45, Bertrand Drouvot wrote:
> > Hi,
> > 
> > This new thread is a follow-up of [1] and [2].
> > 
> > Problem description:
> > 
> > We have occasionally observed objects having an orphaned dependency, the
> > most common case we have seen is functions not linked to any namespaces.
> > 
> > ...
> > 
> > Looking forward to your feedback,
> 
> This have reminded me of bug #17182 [1].

Thanks for the link to the bug!

> Unfortunately, with the patch applied, the following script:
> 
> for ((i=1;i<=100;i++)); do
>   ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 &
>   echo "
> CREATE SCHEMA s;
> CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
> CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
> CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
> CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
> CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
>   "  | psql >psql2.log 2>&1 &
>   wait
>   psql -c "DROP SCHEMA s CASCADE" >psql3.log
> done
> echo "
> SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
>   LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql
> 
> still ends with:
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> 
> 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG:  server process (PID
> 1388378) was terminated by signal 11: Segmentation fault
> 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL:  Failed process was
> running: SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid
> FROM pg_proc pp
>       LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL
> 

Thanks for sharing the script.

That's weird, I just launched it several times with the patch applied and I'm not
able to see the seg fault (while I can see it constently failing on the master
branch).

Are you 100% sure you tested it against a binary with the patch applied?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

От
Alexander Lakhin
Дата:
22.04.2024 13:52, Bertrand Drouvot wrote:
>
> That's weird, I just launched it several times with the patch applied and I'm not
> able to see the seg fault (while I can see it constently failing on the master
> branch).
>
> Are you 100% sure you tested it against a binary with the patch applied?
>

Yes, at least I can't see what I'm doing wrong. Please try my
self-contained script attached.

Best regards,
Alexander
Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> 22.04.2024 13:52, Bertrand Drouvot wrote:
> > 
> > That's weird, I just launched it several times with the patch applied and I'm not
> > able to see the seg fault (while I can see it constently failing on the master
> > branch).
> > 
> > Are you 100% sure you tested it against a binary with the patch applied?
> > 
> 
> Yes, at least I can't see what I'm doing wrong. Please try my
> self-contained script attached.

Thanks for sharing your script!

Yeah your script ensures the patch is applied before the repro is executed.

I do confirm that I can also see the issue with the patch applied (but I had to
launch multiple attempts, while on master one attempt is enough).

I'll have a look.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Apr 23, 2024 at 04:59:09AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> > 22.04.2024 13:52, Bertrand Drouvot wrote:
> > > 
> > > That's weird, I just launched it several times with the patch applied and I'm not
> > > able to see the seg fault (while I can see it constently failing on the master
> > > branch).
> > > 
> > > Are you 100% sure you tested it against a binary with the patch applied?
> > > 
> > 
> > Yes, at least I can't see what I'm doing wrong. Please try my
> > self-contained script attached.
> 
> Thanks for sharing your script!
> 
> Yeah your script ensures the patch is applied before the repro is executed.
> 
> I do confirm that I can also see the issue with the patch applied (but I had to
> launch multiple attempts, while on master one attempt is enough).
> 
> I'll have a look.

Please find attached v2 that should not produce the issue anymore (I launched a
lot of attempts without any issues). v1 was not strong enough as it was not
always checking for the dependent object existence. v2 now always checks if the
object still exists after the additional lock acquisition attempt while recording
the dependency.

I still need to think about v2 but in the meantime could you please also give
v2 a try on you side?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Apr 23, 2024 at 04:20:46PM +0000, Bertrand Drouvot wrote:
> Please find attached v2 that should not produce the issue anymore (I launched a
> lot of attempts without any issues). v1 was not strong enough as it was not
> always checking for the dependent object existence. v2 now always checks if the
> object still exists after the additional lock acquisition attempt while recording
> the dependency.
> 
> I still need to think about v2 but in the meantime could you please also give
> v2 a try on you side?

I gave more thought to v2 and the approach seems reasonable to me. Basically what
it does is that in case the object is already dropped before we take the new lock
(while recording the dependency) then the error message is a "generic" one (means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.

Of course, I would be happy to hear others thought about it.

Please find v3 attached (which is v2 with more comments).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Alexander Lakhin
Дата:
Hi Bertrand,

24.04.2024 11:38, Bertrand Drouvot wrote:
>> Please find attached v2 that should not produce the issue anymore (I launched a
>> lot of attempts without any issues). v1 was not strong enough as it was not
>> always checking for the dependent object existence. v2 now always checks if the
>> object still exists after the additional lock acquisition attempt while recording
>> the dependency.
>>
>> I still need to think about v2 but in the meantime could you please also give
>> v2 a try on you side?
> I gave more thought to v2 and the approach seems reasonable to me. Basically what
> it does is that in case the object is already dropped before we take the new lock
> (while recording the dependency) then the error message is a "generic" one (means
> it does not provide the description of the "already" dropped object). I think it
> makes sense to write the patch that way by abandoning the patch's ambition to
> tell the description of the dropped object in all the cases.
>
> Of course, I would be happy to hear others thought about it.
>
> Please find v3 attached (which is v2 with more comments).

Thank you for the improved version!

I can confirm that it fixes that case.
I've also tested other cases that fail on master (most of them also fail
with v1), please try/look at the attached script. (There could be other
broken-dependency cases, of course, but I think I've covered all the
representative ones.)

All tested cases work correctly with v3 applied — I couldn't get broken
dependencies, though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.

Best regards,
Alexander
Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Apr 24, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> 24.04.2024 11:38, Bertrand Drouvot wrote:
> > I gave more thought to v2 and the approach seems reasonable to me. Basically what
> > it does is that in case the object is already dropped before we take the new lock
> > (while recording the dependency) then the error message is a "generic" one (means
> > it does not provide the description of the "already" dropped object). I think it
> > makes sense to write the patch that way by abandoning the patch's ambition to
> > tell the description of the dropped object in all the cases.
> > 
> > Of course, I would be happy to hear others thought about it.
> > 
> > Please find v3 attached (which is v2 with more comments).
> 
> Thank you for the improved version!
> 
> I can confirm that it fixes that case.

Great, thanks for the test!

> I've also tested other cases that fail on master (most of them also fail
> with v1), please try/look at the attached script.

Thanks for all those tests!

> (There could be other broken-dependency cases, of course, but I think I've
> covered all the representative ones.)

Agree. Furthermore the way the patch is written should be agnostic to the
object's kind that are part of the dependency. Having said that, that does not
hurt to add more tests in this patch, so v4 attached adds some of your tests (
that would fail on the master branch without the patch applied).

The way the tests are written in the patch are less "racy" that when triggered
with your script. Indeed, I think that in the patch the dependent object can not
be removed before the new lock is taken when recording the dependency. We may
want to add injection points in the game if we feel the need.

> All tested cases work correctly with v3 applied —

Yeah, same on my side, I did run them too and did not find any issues.

> I couldn't get broken
> dependencies,

Same here.

> though concurrent create/drop operations can still produce
> the "cache lookup failed" error, which is probably okay, except that it is
> an INTERNAL_ERROR, which assumed to be not easily triggered by users.

I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?

Attached v4, simply adding more tests to v3.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Alexander Lakhin
Дата:
Hi,

25.04.2024 08:00, Bertrand Drouvot wrote:

though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.
I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?

You can try, for example, table-trigger, or other tests that check for
"cache lookup failed" psql output only (maybe you'll need to increase the
iteration count). For instance, I've got (with v4 applied):
2024-04-25 05:48:08.102 UTC [3638763] ERROR:  cache lookup failed for function 20893
2024-04-25 05:48:08.102 UTC [3638763] STATEMENT:  CREATE TRIGGER modified_c1 BEFORE UPDATE OF c ON t
        FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE trigger_func('modified_c');

Or with function-function:
2024-04-25 05:52:31.390 UTC [3711897] ERROR:  cache lookup failed for function 32190 at character 54
2024-04-25 05:52:31.390 UTC [3711897] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
--
2024-04-25 05:52:37.639 UTC [3720011] ERROR:  cache lookup failed for function 34465 at character 54
2024-04-25 05:52:37.639 UTC [3720011] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;

Attached v4, simply adding more tests to v3.

Thank you for working on this!

Best regards,
Alexander

Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Apr 25, 2024 at 09:00:00AM +0300, Alexander Lakhin wrote:
> Hi,
> 
> 25.04.2024 08:00, Bertrand Drouvot wrote:
> > 
> > > though concurrent create/drop operations can still produce
> > > the "cache lookup failed" error, which is probably okay, except that it is
> > > an INTERNAL_ERROR, which assumed to be not easily triggered by users.
> > I did not see any of those "cache lookup failed" during my testing with/without
> > your script. During which test(s) did you see those with v3 applied?
> 
> You can try, for example, table-trigger, or other tests that check for
> "cache lookup failed" psql output only (maybe you'll need to increase the
> iteration count). For instance, I've got (with v4 applied):
> 2024-04-25 05:48:08.102 UTC [3638763] ERROR:  cache lookup failed for function 20893
> 2024-04-25 05:48:08.102 UTC [3638763] STATEMENT:  CREATE TRIGGER modified_c1 BEFORE UPDATE OF c ON t
>         FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE trigger_func('modified_c');
> 
> Or with function-function:
> 2024-04-25 05:52:31.390 UTC [3711897] ERROR:  cache lookup failed for function 32190 at character 54
> 2024-04-25 05:52:31.390 UTC [3711897] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
> --
> 2024-04-25 05:52:37.639 UTC [3720011] ERROR:  cache lookup failed for function 34465 at character 54
> 2024-04-25 05:52:37.639 UTC [3720011] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;

I see, so this is during object creation.

It's easy to reproduce this kind of issue with gdb. For example set a breakpoint
on SearchSysCache1() and during the create function f1() once it breaks on:

#0  SearchSysCache1 (cacheId=45, key1=16400) at syscache.c:221
#1  0x00005ad305beacd6 in func_get_detail (funcname=0x5ad308204d50, fargs=0x0, fargnames=0x0, nargs=0,
argtypes=0x7ffff2ff9cc0,expand_variadic=true, expand_defaults=true, include_out_arguments=false, funcid=0x7ffff2ff9ba0,
rettype=0x7ffff2ff9b9c,retset=0x7ffff2ff9b94, nvargs=0x7ffff2ff9ba4,
 
    vatype=0x7ffff2ff9ba8, true_typeids=0x7ffff2ff9bd8, argdefaults=0x7ffff2ff9be0) at parse_func.c:1622
#2  0x00005ad305be7dd0 in ParseFuncOrColumn (pstate=0x5ad30823be98, funcname=0x5ad308204d50, fargs=0x0, last_srf=0x0,
fn=0x5ad308204da0,proc_call=false, location=55) at parse_func.c:266
 
#3  0x00005ad305bdffb0 in transformFuncCall (pstate=0x5ad30823be98, fn=0x5ad308204da0) at parse_expr.c:1474
#4  0x00005ad305bdd2ee in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204da0) at parse_expr.c:230
#5  0x00005ad305bdec34 in transformAExprOp (pstate=0x5ad30823be98, a=0x5ad308204e20) at parse_expr.c:990
#6  0x00005ad305bdd1a0 in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204e20) at parse_expr.c:187
#7  0x00005ad305bdd00b in transformExpr (pstate=0x5ad30823be98, expr=0x5ad308204e20, exprKind=EXPR_KIND_SELECT_TARGET)
atparse_expr.c:131
 
#8  0x00005ad305b96b7e in transformReturnStmt (pstate=0x5ad30823be98, stmt=0x5ad308204ee0) at analyze.c:2395
#9  0x00005ad305b92213 in transformStmt (pstate=0x5ad30823be98, parseTree=0x5ad308204ee0) at analyze.c:375
#10 0x00005ad305c6321a in interpret_AS_clause (languageOid=14, languageName=0x5ad308204c40 "sql",
funcname=0x5ad308204ad8"f100", as=0x0, sql_body_in=0x5ad308204ee0, parameterTypes=0x0, inParameterNames=0x0,
prosrc_str_p=0x7ffff2ffa208,probin_str_p=0x7ffff2ffa200, sql_body_out=0x7ffff2ffa210,
 
    queryString=0x5ad3082040b0 "CREATE FUNCTION f100() RETURNS int LANGUAGE SQL RETURN f() + 1;") at
functioncmds.c:953
#11 0x00005ad305c63c93 in CreateFunction (pstate=0x5ad308186310, stmt=0x5ad308204f00) at functioncmds.c:1221

then drop function f() in another session. Then the create function f1() would:

postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1;
ERROR:  cache lookup failed for function 16400

This stuff does appear before we get a chance to call the new depLockAndCheckObject()
function.

I think this is what Tom was referring to in [1]:

"
So the only real fix for this would be to make every object lookup in the entire
system do the sort of dance that's done in RangeVarGetRelidExtended.
"

The fact that those kind of errors appear also somehow ensure that no orphaned
dependencies can be created.

The patch does ensure that no orphaned depencies can occur after those "initial"
look up are done (should the dependent object be dropped).

I'm tempted to not add extra complexity to avoid those kind of errors and keep the
patch as it is. All of those servicing the same goal: no orphaned depencies are
created.

[1]: https://www.postgresql.org/message-id/2872252.1630851337%40sss.pgh.pa.us

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

От
Alexander Lakhin
Дата:
Hi Bertrand,

25.04.2024 10:20, Bertrand Drouvot wrote:
> postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1;
> ERROR:  cache lookup failed for function 16400
>
> This stuff does appear before we get a chance to call the new depLockAndCheckObject()
> function.
>
> I think this is what Tom was referring to in [1]:
>
> "
> So the only real fix for this would be to make every object lookup in the entire
> system do the sort of dance that's done in RangeVarGetRelidExtended.
> "
>
> The fact that those kind of errors appear also somehow ensure that no orphaned
> dependencies can be created.

I agree; the only thing that I'd change here, is the error code.

But I've discovered yet another possibility to get a broken dependency.
Please try this script:
res=0
numclients=20
for ((i=1;i<=100;i++)); do
for ((c=1;c<=numclients;c++)); do
   echo "
CREATE SCHEMA s_$c;
CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
ALTER CONVERSION myconv_$c SET SCHEMA s_$c;
   " | psql >psql1-$c.log 2>&1 &
   echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 &
done
wait
pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; }
for ((c=1;c<=numclients;c++)); do
   echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1
done
done
psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);"

It fails for me (with the v4 patch applied) as follows:
pg_dump: error: schema with OID 16392 does not exist
on iteration 1
   oid  | conname  | connamespace | conowner | conforencoding | contoencoding |      conproc      | condefault
-------+----------+--------------+----------+----------------+---------------+-------------------+------------
  16396 | myconv_6 |        16392 |       10 |              8 |             6 | iso8859_1_to_utf8 | f

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Apr 30, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> But I've discovered yet another possibility to get a broken dependency.

Thanks for the testing and the report!

> Please try this script:
> res=0
> numclients=20
> for ((i=1;i<=100;i++)); do
> for ((c=1;c<=numclients;c++)); do
>   echo "
> CREATE SCHEMA s_$c;
> CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
> ALTER CONVERSION myconv_$c SET SCHEMA s_$c;
>   " | psql >psql1-$c.log 2>&1 &
>   echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 &
> done
> wait
> pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; }
> for ((c=1;c<=numclients;c++)); do
>   echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1
> done
> done
> psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);"
> 
> It fails for me (with the v4 patch applied) as follows:
> pg_dump: error: schema with OID 16392 does not exist
> on iteration 1
>   oid  | conname  | connamespace | conowner | conforencoding | contoencoding |      conproc      | condefault
> -------+----------+--------------+----------+----------------+---------------+-------------------+------------
>  16396 | myconv_6 |        16392 |       10 |              8 |             6 | iso8859_1_to_utf8 | f
> 

Thanks for sharing the test, I'm able to reproduce the issue with v4.

Oh I see, your test updates an existing dependency. v4 took care about brand new 
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).

Please find attached v5 that adds:

- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.

With v5 applied, I don't see the issue anymore.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Alexander Lakhin
Дата:
Hi Bertrand,

09.05.2024 15:20, Bertrand Drouvot wrote:
> Oh I see, your test updates an existing dependency. v4 took care about brand new
> dependencies creation (recordMultipleDependencies()) but forgot to take care
> about changing an existing dependency (which is done in another code path:
> changeDependencyFor()).
>
> Please find attached v5 that adds:
>
> - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> - a test when altering an existing dependency.
>
> With v5 applied, I don't see the issue anymore.

Me too. Thank you for the improved version!
I will test the patch in the background, but for now I see no other
issues with it.

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

От
Michael Paquier
Дата:
On Thu, May 09, 2024 at 12:20:51PM +0000, Bertrand Drouvot wrote:
> Oh I see, your test updates an existing dependency. v4 took care about brand new
> dependencies creation (recordMultipleDependencies()) but forgot to take care
> about changing an existing dependency (which is done in another code path:
> changeDependencyFor()).
>
> Please find attached v5 that adds:
>
> - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> - a test when altering an existing dependency.
>
> With v5 applied, I don't see the issue anymore.

+            if (object_description)
+                ereport(ERROR, errmsg("%s does not exist", object_description));
+            else
+                ereport(ERROR, errmsg("a dependent object does not ex

This generates an internal error code.  Is that intended?

--- /dev/null
+++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec

This introduces a module with only one single spec.  I could get
behind an extra module if splitting the tests into more specs makes
sense or if there is a restriction on installcheck.  However, for
one spec file filed with a bunch of objects, and note that I'm OK to
let this single spec grow more for this range of tests, it seems to me
that this would be better under src/test/isolation/.

+        if (use_dirty_snapshot)
+        {
+            InitDirtySnapshot(DirtySnapshot);
+            snapshot = &DirtySnapshot;
+        }
+        else
+            snapshot = NULL;

I'm wondering if Robert has a comment about that.  It looks backwards
in a world where we try to encourage MVCC snapshots for catalog
scans (aka 568d4138c646), especially for the part related to
dependency.c and ObjectAddresses.
--
Michael

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
> On Thu, May 09, 2024 at 12:20:51PM +0000, Bertrand Drouvot wrote:
> > Oh I see, your test updates an existing dependency. v4 took care about brand new 
> > dependencies creation (recordMultipleDependencies()) but forgot to take care
> > about changing an existing dependency (which is done in another code path:
> > changeDependencyFor()).
> > 
> > Please find attached v5 that adds:
> > 
> > - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> > - a test when altering an existing dependency.
> > 
> > With v5 applied, I don't see the issue anymore.
> 
> +            if (object_description)
> +                ereport(ERROR, errmsg("%s does not exist", object_description));
> +            else
> +                ereport(ERROR, errmsg("a dependent object does not ex
> 
> This generates an internal error code.  Is that intended?

Thanks for looking at it!

Yes, it's like when say you want to create an object in a schema that does not
exist (see get_namespace_oid()).

> --- /dev/null
> +++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec 
> 
> This introduces a module with only one single spec.  I could get
> behind an extra module if splitting the tests into more specs makes
> sense or if there is a restriction on installcheck.  However, for 
> one spec file filed with a bunch of objects, and note that I'm OK to
> let this single spec grow more for this range of tests, it seems to me
> that this would be better under src/test/isolation/.

Yeah, I was not sure about this one (the location is from take 2 mentioned
up-thread). I'll look at moving the tests to src/test/isolation/.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Tue, May 14, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> 09.05.2024 15:20, Bertrand Drouvot wrote:
> > Oh I see, your test updates an existing dependency. v4 took care about brand new
> > dependencies creation (recordMultipleDependencies()) but forgot to take care
> > about changing an existing dependency (which is done in another code path:
> > changeDependencyFor()).
> > 
> > Please find attached v5 that adds:
> > 
> > - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> > - a test when altering an existing dependency.
> > 
> > With v5 applied, I don't see the issue anymore.
> 
> Me too. Thank you for the improved version!
> I will test the patch in the background, but for now I see no other
> issues with it.

Thanks for confirming!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Wed, May 15, 2024 at 08:31:43AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
> > +++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec 
> > 
> > This introduces a module with only one single spec.  I could get
> > behind an extra module if splitting the tests into more specs makes
> > sense or if there is a restriction on installcheck.  However, for 
> > one spec file filed with a bunch of objects, and note that I'm OK to
> > let this single spec grow more for this range of tests, it seems to me
> > that this would be better under src/test/isolation/.
> 
> Yeah, I was not sure about this one (the location is from take 2 mentioned
> up-thread). I'll look at moving the tests to src/test/isolation/.

Please find attached v6 (only diff with v5 is moving the tests as suggested
above).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Alexander Lakhin
Дата:
Hello Bertrand,

15.05.2024 11:31, Bertrand Drouvot wrote:
> On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
>
>> +            if (object_description)
>> +                ereport(ERROR, errmsg("%s does not exist", object_description));
>> +            else
>> +                ereport(ERROR, errmsg("a dependent object does not ex
>>
>> This generates an internal error code.  Is that intended?
> Yes, it's like when say you want to create an object in a schema that does not
> exist (see get_namespace_oid()).

AFAICS, get_namespace_oid() throws not ERRCODE_INTERNAL_ERROR,
but ERRCODE_UNDEFINED_SCHEMA:

# SELECT regtype('unknown_schema.type');
ERROR:  schema "unknown_schema" does not exist
LINE 1: SELECT regtype('unknown_schema.type');
                        ^
# \echo :LAST_ERROR_SQLSTATE
3F000

Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]
and [2], as these errors are not that abnormal (not Assert-like).

[1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz
[2] https://commitfest.postgresql.org/48/4735/

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Sun, May 19, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:
> Hello Bertrand,
> 
> Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]
> and [2], as these errors are not that abnormal (not Assert-like).
> 
> [1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz
> [2] https://commitfest.postgresql.org/48/4735/
> 

Thanks for mentioning the above examples, I agree that it's worth to avoid
ERRCODE_INTERNAL_ERROR here: please find attached v7 that makes use of a new
ERRCODE: ERRCODE_DEPENDENT_OBJECTS_DOES_NOT_EXIST 

I thought about this name as it is close enough to the already existing 
"ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST" but I'm open to suggestion too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Avoid orphaned objects dependencies, take 3

От
Robert Haas
Дата:
On Wed, May 15, 2024 at 6:31 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Please find attached v6 (only diff with v5 is moving the tests as suggested
> above).

I don't immediately know what to think about this patch. I've known
about this issue for a long time, but I didn't think this was how we
would fix it.

I think the problem here is basically that we don't lock namespaces
(schemas) when we're adding and removing things from the schema. So I
assumed that if we ever did something about this, what we would do
would be add a bunch of calls to lock schemas to the appropriate parts
of the code. What you've done here instead is add locking at a much
lower level - whenever we are adding a dependency on an object, we
lock the object. The advantage of that approach is that we definitely
won't miss anything. The disadvantage of that approach is that it
means we have some very low-level code taking locks, which means it's
not obvious which operations are taking what locks. Maybe it could
even result in some redundancy, like the higher-level code taking a
lock also (possibly in a different mode) and then this code taking
another one.

I haven't gone through the previous threads; it sounds like there's
already been some discussion of this, but I'm just telling you how it
strikes me on first look.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Tue, May 21, 2024 at 08:53:06AM -0400, Robert Haas wrote:
> On Wed, May 15, 2024 at 6:31 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Please find attached v6 (only diff with v5 is moving the tests as suggested
> > above).
> 
> I don't immediately know what to think about this patch.

Thanks for looking at it!

> I've known about this issue for a long time, but I didn't think this was how we
> would fix it.

I started initially with [1] but it was focusing on function-schema only.

Then I proposed [2] making use of a dirty snapshot when recording the dependency.
But this approach appeared to be "scary" and it was still failing to close
some race conditions.

Then, Tom proposed another approach in [2] which is that "creation DDL will have
to take a lock on each referenced object that'd conflict with a lock taken by DROP".
This is the one the current patch is trying to implement.

> What you've done here instead is add locking at a much
> lower level - whenever we are adding a dependency on an object, we
> lock the object. The advantage of that approach is that we definitely
> won't miss anything.

Right, as there is much more than the ones related to schemas, for example:

- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper

to name a few.

> The disadvantage of that approach is that it
> means we have some very low-level code taking locks, which means it's
> not obvious which operations are taking what locks.

Right, but the new operations are "only" the ones leading to recording or altering
a dependency.

> Maybe it could
> even result in some redundancy, like the higher-level code taking a
> lock also (possibly in a different mode) and then this code taking
> another one.

The one that is added here is in AccessShareLock mode. It could conflict with
the ones in AccessExclusiveLock means (If I'm not missing any):

- AcquireDeletionLock(): which is exactly what we want
- get_object_address()
    - get_object_address_rv()
        - ExecAlterObjectDependsStmt()
    - ExecRenameStmt()
    - ExecAlterObjectDependsStmt()
    - ExecAlterOwnerStmt()
    - RemoveObjects()
- AlterPublication()

I think there is 2 cases here:

First case: the "conflicting" lock mode is for one of our own lock then LockCheckConflicts()
would report this as a NON conflict.

Second case: the "conflicting" lock mode is NOT for one of our own lock then LockCheckConflicts()
would report a conflict. But I've the feeling that the existing code would
already lock those sessions.

One example where it would be the case:

Session 1: doing "BEGIN; ALTER FUNCTION noschemas.foo2() SET SCHEMA alterschema" would
acquire the lock in AccessExclusiveLock during ExecAlterObjectSchemaStmt()->get_object_address()->LockDatabaseObject()
(in the existing code and before the new call that would occur through changeDependencyFor()->depLockAndCheckObject()
with the patch in place).

Then, session 2: doing "alter function noschemas.foo2() owner to newrole;"
would be locked in the existing code while doing ExecAlterOwnerStmt()->get_object_address()->LockDatabaseObject()).

Means that in this example, the new lock this patch is introducing would not be
responsible of session 2 beging locked.

Was your concern about "First case" or "Second case" or both?

[1]: https://www.postgresql.org/message-id/flat/5a9daaae-5538-b209-6279-e903c3ea2157%40amazon.com
[2]: https://www.postgresql.org/message-id/flat/8369ff70-0e31-f194-2954-787f4d9e21dd%40amazon.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

От
Robert Haas
Дата:
On Wed, May 22, 2024 at 6:21 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> I started initially with [1] but it was focusing on function-schema only.

Yeah, that's what I thought we would want to do. And then just extend
that to the other cases.

> Then I proposed [2] making use of a dirty snapshot when recording the dependency.
> But this approach appeared to be "scary" and it was still failing to close
> some race conditions.

The current patch still seems to be using dirty snapshots for some
reason, which struck me as a bit odd. My intuition is that if we're
relying on dirty snapshots to solve problems, we likely haven't solved
the problems correctly, which seems consistent with your statement
about "failing to close some race conditions". But I don't think I
understand the situation well enough to be sure just yet.

> Then, Tom proposed another approach in [2] which is that "creation DDL will have
> to take a lock on each referenced object that'd conflict with a lock taken by DROP".
> This is the one the current patch is trying to implement.

It's a clever idea, but I'm not sure that I like it.

> I think there is 2 cases here:
>
> First case: the "conflicting" lock mode is for one of our own lock then LockCheckConflicts()
> would report this as a NON conflict.
>
> Second case: the "conflicting" lock mode is NOT for one of our own lock then LockCheckConflicts()
> would report a conflict. But I've the feeling that the existing code would
> already lock those sessions.
>
> Was your concern about "First case" or "Second case" or both?

The second case, I guess. It's bad to take a weaker lock first and
then a stronger lock on the same object later, because it can lead to
deadlocks that would have been avoided if the stronger lock had been
taken at the outset. Here, it seems like things would happen in the
other order: if we took two locks, we'd probably take the stronger
lock in the higher-level code and then the weaker lock in the
dependency code. That shouldn't break anything; it's just a bit
inefficient. My concern was really more about the maintainability of
the code. I fear that if we add code that takes heavyweight locks in
surprising places, we might later find the behavior difficult to
reason about.

Tom, what is your thought about that concern?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Wed, May 22, 2024 at 10:48:12AM -0400, Robert Haas wrote:
> On Wed, May 22, 2024 at 6:21 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > I started initially with [1] but it was focusing on function-schema only.
> 
> Yeah, that's what I thought we would want to do. And then just extend
> that to the other cases.
> 
> > Then I proposed [2] making use of a dirty snapshot when recording the dependency.
> > But this approach appeared to be "scary" and it was still failing to close
> > some race conditions.
> 
> The current patch still seems to be using dirty snapshots for some
> reason, which struck me as a bit odd. My intuition is that if we're
> relying on dirty snapshots to solve problems, we likely haven't solved
> the problems correctly, which seems consistent with your statement
> about "failing to close some race conditions". But I don't think I
> understand the situation well enough to be sure just yet.

The reason why we are using a dirty snapshot here is for the cases where we are
recording a dependency on a referenced object that we are creating at the same
time behind the scene (for example, creating a composite type while creating
a relation). Without the dirty snapshot, then the object we are creating behind
the scene (the composite type) would not be visible and we would wrongly assume
that it has been dropped.

Note that the usage of the dirty snapshot is only when the object is first
reported as "non existing" by the new ObjectByIdExist() function.

> > I think there is 2 cases here:
> >
> > First case: the "conflicting" lock mode is for one of our own lock then LockCheckConflicts()
> > would report this as a NON conflict.
> >
> > Second case: the "conflicting" lock mode is NOT for one of our own lock then LockCheckConflicts()
> > would report a conflict. But I've the feeling that the existing code would
> > already lock those sessions.
> >
> > Was your concern about "First case" or "Second case" or both?
> 
> The second case, I guess. It's bad to take a weaker lock first and
> then a stronger lock on the same object later, because it can lead to
> deadlocks that would have been avoided if the stronger lock had been
> taken at the outset.

In the example I shared up-thread that would be the opposite: the Session 1 would
take an AccessExclusiveLock lock on the object before taking an AccessShareLock
during changeDependencyFor().

> Here, it seems like things would happen in the
> other order: if we took two locks, we'd probably take the stronger
> lock in the higher-level code and then the weaker lock in the
> dependency code.

Yeah, I agree.

> That shouldn't break anything; it's just a bit
> inefficient.

Yeah, the second lock is useless in that case (like in the example up-thread).

> My concern was really more about the maintainability of
> the code. I fear that if we add code that takes heavyweight locks in
> surprising places, we might later find the behavior difficult to
> reason about.
>

I think I understand your concern about code maintainability but I'm not sure
that adding locks while recording a dependency is that surprising.

> Tom, what is your thought about that concern?

+1

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

От
Robert Haas
Дата:
On Thu, May 23, 2024 at 12:19 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> The reason why we are using a dirty snapshot here is for the cases where we are
> recording a dependency on a referenced object that we are creating at the same
> time behind the scene (for example, creating a composite type while creating
> a relation). Without the dirty snapshot, then the object we are creating behind
> the scene (the composite type) would not be visible and we would wrongly assume
> that it has been dropped.

The usual reason for using a dirty snapshot is that you want to see
uncommitted work by other transactions. It sounds like you're saying
you just need to see uncommitted work by the same transaction. If
that's true, I think using HeapTupleSatisfiesSelf would be clearer. Or
maybe we just need to put CommandCounterIncrement() calls in the right
places to avoid having the problem in the first place. Or maybe this
is another sign that we're doing the work at the wrong level.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

От
Bertrand Drouvot
Дата:
Hi,

On Thu, May 23, 2024 at 02:10:54PM -0400, Robert Haas wrote:
> On Thu, May 23, 2024 at 12:19 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > The reason why we are using a dirty snapshot here is for the cases where we are
> > recording a dependency on a referenced object that we are creating at the same
> > time behind the scene (for example, creating a composite type while creating
> > a relation). Without the dirty snapshot, then the object we are creating behind
> > the scene (the composite type) would not be visible and we would wrongly assume
> > that it has been dropped.
> 
> The usual reason for using a dirty snapshot is that you want to see
> uncommitted work by other transactions. It sounds like you're saying
> you just need to see uncommitted work by the same transaction.

Right.

> If that's true, I think using HeapTupleSatisfiesSelf would be clearer.

Oh thanks! I did not know about the SNAPSHOT_SELF snapshot type (I should have
check all the snapshot types first though) and that's exactly what is needed here.

Please find attached v8 making use of SnapshotSelf instead of a dirty snapshot.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения