Обсуждение: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
The attached patch implements a new SEARCH clause for CREATE FUNCTION. The SEARCH clause controls the search_path used when executing functions that were created without a SET clause. Background: Controlling search_path is critical for the correctness and security of functions. Right now, the author of a function without a SET clause has little ability to control the function's behavior, because even basic operators like "+" involve search_path. This is a big problem for, e.g. functions used in expression indexes which are called by any user with write privileges on the table. Motivation: I'd like to (eventually) get to safe-by-default behavior. In other words, the simplest function declaration should be safe for the most common use cases. To get there, we need some way to explicitly specify the less common cases. Right now there's no way for the function author to indicate that a function intends to use the session's search path. We also need an easier way to specify that the user wants a safe search_path ("SET search_path = pg_catalog, pg_temp" is arcane). And when we know more about the user's actual intent, then it will be easier to either form a transition plan to push users into the safer behavior, or at least warn strongly when the user is doing something dangerous (i.e. using a function that depends on the session's search path as part of an expression index). Today, the only information we have about the user's intent is the presence or absence of a "SET search_path" clause, which is not a strong signal. Proposal: Add SEARCH { DEFAULT | SYSTEM | SESSION } clause to CREATE/ALTER function. * SEARCH DEFAULT is the same as no SEARCH clause at all, and ends up stored in the catalog as prosearch='d'. * SEARCH SYSTEM means that we switch to the safe search path of "pg_catalog, pg_temp" when executing the function. Stored as prosearch='y'. * SEARCH SESSION means that we don't switch the search_path when executing the function, and it's inherited from the session. Stored as prosearch='e'. Regardless of the SEARCH clause, a "SET search_path" clause will override it. The SEARCH clause only matters when "SET search_path" is not there. Additionally provide a GUC, defaulting to false for compatibility, that can interpret prosearch='d' as if it were prosearch='y'. It could help provide a transition path. I know there's a strong reluctance to adding these kinds of GUCs; I can remove it and I think the patch will still be worthwhile. Perhaps there are alternatives that could help with migration at pg_dump time instead? Benefits: 1. The user can be more explicit about their actual intent. Do they want safety and consistency? Or the flexibility of using the session's search_path? 2. We can more accurately serve the user's intent. For instance, the safe search_path of "pg_catalog, pg_temp" is arcane and seems to be there just because we don't have a way to specify that pg_temp be excluded entirely. But perhaps in the future we *do* want to exclude pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM" allows us some freedom to do that. 3. Users can be forward-compatible by specifying the functions that really do need to use the session's search path as SEARCH SESSION, so that they will never be broken in the future. That gives us a cleaner path toward making the default behavior safe. -- Jeff Davis PostgreSQL Contributor Team - AWS
Вложения
On 8/11/23 22:35, Jeff Davis wrote: > The attached patch implements a new SEARCH clause for CREATE FUNCTION. > The SEARCH clause controls the search_path used when executing > functions that were created without a SET clause. > > Background: > > Controlling search_path is critical for the correctness and security of > functions. Right now, the author of a function without a SET clause has > little ability to control the function's behavior, because even basic > operators like "+" involve search_path. This is a big problem for, e.g. > functions used in expression indexes which are called by any user with > write privileges on the table. > > Motivation: > > I'd like to (eventually) get to safe-by-default behavior. In other > words, the simplest function declaration should be safe for the most > common use cases. I agree with the general need. > Add SEARCH { DEFAULT | SYSTEM | SESSION } clause to CREATE/ALTER > function. > > * SEARCH DEFAULT is the same as no SEARCH clause at all, and ends up > stored in the catalog as prosearch='d'. > * SEARCH SYSTEM means that we switch to the safe search path of > "pg_catalog, pg_temp" when executing the function. Stored as > prosearch='y'. > * SEARCH SESSION means that we don't switch the search_path when > executing the function, and it's inherited from the session. Stored as > prosearch='e'. It isn't clear to me what is the precise difference between DEFAULT and SESSION > 2. We can more accurately serve the user's intent. For instance, the > safe search_path of "pg_catalog, pg_temp" is arcane and seems to be > there just because we don't have a way to specify that pg_temp be > excluded entirely. But perhaps in the future we *do* want to exclude > pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM" > allows us some freedom to do that. Personally I think having pg_temp in the SYSTEM search path makes sense for temp tables, but I find it easy to forget that functions can be created by unprivileged users in pg_temp, and therefore having pg_temp in the search path for functions is dangerous. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 8/12/23 09:15, Joe Conway wrote: > On 8/11/23 22:35, Jeff Davis wrote: >> 2. We can more accurately serve the user's intent. For instance, the >> safe search_path of "pg_catalog, pg_temp" is arcane and seems to be >> there just because we don't have a way to specify that pg_temp be >> excluded entirely. But perhaps in the future we *do* want to exclude >> pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM" >> allows us some freedom to do that. > > Personally I think having pg_temp in the SYSTEM search path makes sense > for temp tables, but I find it easy to forget that functions can be > created by unprivileged users in pg_temp, and therefore having pg_temp > in the search path for functions is dangerous. Hmm, I guess I was too hasty -- seems we have some magic related to this already. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Sat, 2023-08-12 at 09:15 -0400, Joe Conway wrote: > It isn't clear to me what is the precise difference between DEFAULT > and > SESSION The the current patch, SESSION always gets the search path from the session, while DEFAULT is controlled by the GUC safe_function_search_path. If the GUC is false (the default) then DEFAULT and SESSION are the same. If the GUC is true, then DEFAULT and SYSTEM are the same. There are alternatives to using a GUC to differentiate them. The main point of this patch is to capture what the user intends in a convenient and forward-compatible way. If the user specifies nothing at all, they get DEFAULT, and we could treat that specially in various ways to move toward safety while minimizing breakage. > > Personally I think having pg_temp in the SYSTEM search path makes > sense > for temp tables The patch doesn't change this behavior -- SYSTEM (without any other SET) gives you "pg_catalog, pg_temp" and there's no way to exclude pg_temp entirely. My point was that by capturing the user's intent with SEARCH SYSTEM, it gives us a bit more freedom to have these kinds of discussions later. And it's certainly easier for the user to specify SEARCH SYSTEM than "SET search_path = pg_catalog, pg_temp". Regards, Jeff Davis
On Sat, 2023-08-12 at 09:50 -0400, Joe Conway wrote: > Hmm, I guess I was too hasty -- seems we have some magic related to > this > already. I was worried after your first email. But yes, the magic is in FuncnameGetCandidates(), which simply ignores functions in the temp namespace. It would be better if we were obviously safe rather than magically safe, though. Regards, Jeff Davis
Hi, On 2023-08-11 19:35:22 -0700, Jeff Davis wrote: > Controlling search_path is critical for the correctness and security of > functions. Right now, the author of a function without a SET clause has > little ability to control the function's behavior, because even basic > operators like "+" involve search_path. This is a big problem for, e.g. > functions used in expression indexes which are called by any user with > write privileges on the table. > Motivation: > > I'd like to (eventually) get to safe-by-default behavior. In other > words, the simplest function declaration should be safe for the most > common use cases. I'm not sure that anything based, directly or indirectly, on search_path really is a realistic way to get there. > To get there, we need some way to explicitly specify the less common > cases. Right now there's no way for the function author to indicate > that a function intends to use the session's search path. We also need > an easier way to specify that the user wants a safe search_path ("SET > search_path = pg_catalog, pg_temp" is arcane). No disagreement with that. Even if I don't yet agree that your proposal is a convincing path to "easy security for PLs" - just making the search path stuff less arcane is good. > And when we know more about the user's actual intent, then it will be > easier to either form a transition plan to push users into the safer > behavior, or at least warn strongly when the user is doing something > dangerous (i.e. using a function that depends on the session's search > path as part of an expression index). I think that'd be pretty painful from a UX perspective. Having to write e.g. operators as operator(schema, op) just sucks as an experience. And with extensions plenty of operators will live outside of pg_catalog, so there is plenty things that will need qualifying. And because of things like type coercion search, which prefers "bettering fitting" coercions over search path order, you can't just put "less important" things later in search path. I wonder if we ought to work more on "fossilizing" the result of search path resolutions at the time functions are created, rather than requiring the user to do so explicitly. Most of the problem here comes down to the fact that if a user creates a function like 'a + b' we'll not resolve the operator, the potential type coercions etc, when the function is created - we do so when the function is executed. We can't just store the oids at the time, because that'd end up very fragile - tables/functions/... might be dropped and recreated etc and thus change their oid. But we could change the core PLs to rewrite all the queries (*) so that they schema qualify absolutely everything, including operators and implicit type casts. That way objects referenced by functions can still be replaced, but search path can't be used to "inject" objects in different schemas. Obviously it could lead to errors on some schema changes - e.g. changing a column type might mean that a relevant cast lives in a different place than with the old type - but I think that'll be quite rare. Perhaps we could offer a ALTER FUNCTION ... REFRESH REFERENCES; or such? One obvious downside of such an approach is that it requires some work with each PL. I'm not sure that's avoidable - and I suspect that most "security sensitive" functions are written in just a few languages. (*) Obviously the one thing that doesn't work for is use of EXECUTE in plpgsql and similar constructs elsewhere. I'm not sure there's much that can be done to make that safe, but it's worth thinking about more. Greetings, Andres Freund
On Sat, 2023-08-12 at 11:25 -0700, Andres Freund wrote: > > I'm not sure that anything based, directly or indirectly, on > search_path > really is a realistic way to get there. Can you explain a little more? I see what you mean generally, that search_path is an imprecise thing, and that it leaves room for ambiguity and mistakes. But I also think we can do a lot better than we're doing today and still retain the basic concept of search_path, which is good because it's deeply integrated into postgres, and it's not clear that we're going to get away from it any time soon. > > > I think that'd be pretty painful from a UX perspective. Having to > write > e.g. operators as operator(schema, op) just sucks as an experience. I'm not suggesting that the user fully-qualify everything; I'm suggesting that the include a "SET search_path" clause if they depend on anything other than pg_catalog. > And with > extensions plenty of operators will live outside of pg_catalog, so > there is > plenty things that will need qualifying. In my proposal, that would still involve a "SET search_path TO myextension, pg_catalog, pg_temp". The main reason that's bad is that adding pg_temp at the end is painful UX -- just something that the user needs to remember to do with little obvious reason or observable impact; but it has important security implications. Perhaps we should just not implicitly include pg_temp for a function's search_path (at least for the case of CREATE FUNCTION ... SEARCH SYSTEM)? > And because of things like type > coercion search, which prefers "bettering fitting" coercions over > search path > order, you can't just put "less important" things later in search > path. I understand this introduces some ambiguity, but you just can't include schemas in the search_path that you don't trust, for similar reasons as $PATH. If you have a few objects you'd like to access in another user's schema, fully-qualify them. > We can't just store the oids at the time, because that'd end up very > fragile - > tables/functions/... might be dropped and recreated etc and thus > change their > oid. Robert suggested something along those lines[1]. I won't rule it out, but I agree that there are quite a few things left to figure out. > But we could change the core PLs to rewrite all the queries (*) so > that > they schema qualify absolutely everything, including operators and > implicit > type casts. So not quite like "SET search_path FROM CURRENT": you resolve it to a specific "schemaname.objectname", but stop just short of resolving to a specific OID? An interesting compromise, but I'm not sure what the benefit is vs. SET search_path FROM CURRENT (or some defined search_path). > That way objects referenced by functions can still be replaced, but > search > path can't be used to "inject" objects in different schemas. > Obviously it > could lead to errors on some schema changes - e.g. changing a column > type > might mean that a relevant cast lives in a different place than with > the old > type - but I think that'll be quite rare. Perhaps we could offer a > ALTER > FUNCTION ... REFRESH REFERENCES; or such? Hmm. I feel like that's making things more complicated. I'd find it more straightforward to use something like Robert's approach of fully parsing something, and then have the REFRESH command reparse it when something needs updating. Or perhaps just create all of the dependency entries more like a view query and then auto-refresh. > (*) Obviously the one thing that doesn't work for is use of EXECUTE > in plpgsql > and similar constructs elsewhere. I'm not sure there's much that can > be done > to make that safe, but it's worth thinking about more. I think it would be really nice to have some better control over the search_path regardless, because it still helps with cases like this. A lot of C functions build queries, and I don't think it's reasonable to constantly worry about the ambiguity of the schema for "=". Regards, Jeff Davis [1] https://www.postgresql.org/message-id/CA%2BTgmobd%3DeFRGWHhfG4mG2cA%2BdsVuA4jpBvD8N1NS%3DVc9eHFQg%40mail.gmail.com
On 12.08.23 04:35, Jeff Davis wrote: > The attached patch implements a new SEARCH clause for CREATE FUNCTION. > The SEARCH clause controls the search_path used when executing > functions that were created without a SET clause. I don't understand this. This adds a new option for cases where the existing option wasn't specified. Why not specify the existing option then? Is it not good enough? Can we improve it?
On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote: > On 12.08.23 04:35, Jeff Davis wrote: > > The attached patch implements a new SEARCH clause for CREATE > > FUNCTION. > > The SEARCH clause controls the search_path used when executing > > functions that were created without a SET clause. > > I don't understand this. This adds a new option for cases where the > existing option wasn't specified. Why not specify the existing > option > then? Is it not good enough? Can we improve it? SET search_path = '...' not good enough in my opinion. 1. Not specifying a SET clause falls back to the session's search_path, which is a bad default because it leads to all kinds of inconsistent behavior and security concerns. 2. There's no way to explicitly request that you'd actually like to use the session's search_path, so it makes it very hard to ever change the default. 3. It's user-unfriendly. A safe search_path that would be suitable for most functions is "SET search_path = pg_catalog, pg_temp", which is arcane, and requires some explanation. 4. search_path for the session is conceptually different than for a function. A session should be context-sensitive and the same query should (quite reasonably) behave differently for different sessions and users to sort out things like object name conflicts, etc. A function should (ordinarily) be context-insensitive, especially when used in something like an index expression or constraint. Having different syntax helps separate those concepts. 5. There's no way to prevent pg_temp from being included in the search_path. This is separately fixable, but having the proposed SEARCH syntax is likely to make for a better user experience in the common cases. I'm open to suggestion about other ways to improve it, but SEARCH is what I came up with. Regards, Jeff Davis
On 16.08.23 19:44, Jeff Davis wrote: > On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote: >> On 12.08.23 04:35, Jeff Davis wrote: >>> The attached patch implements a new SEARCH clause for CREATE >>> FUNCTION. >>> The SEARCH clause controls the search_path used when executing >>> functions that were created without a SET clause. >> >> I don't understand this. This adds a new option for cases where the >> existing option wasn't specified. Why not specify the existing >> option >> then? Is it not good enough? Can we improve it? > > SET search_path = '...' not good enough in my opinion. > > 1. Not specifying a SET clause falls back to the session's search_path, > which is a bad default because it leads to all kinds of inconsistent > behavior and security concerns. Not specifying SEARCH would have the same issue? > 2. There's no way to explicitly request that you'd actually like to use > the session's search_path, so it makes it very hard to ever change the > default. That sounds like something that should be fixed independently. I could see this being useful for other GUC settings, like I want to run a function explicitly with the session's work_mem. > 3. It's user-unfriendly. A safe search_path that would be suitable for > most functions is "SET search_path = pg_catalog, pg_temp", which is > arcane, and requires some explanation. True, but is that specific to functions? Maybe I want a safe search_path just in general, for a session or something. > 4. search_path for the session is conceptually different than for a > function. A session should be context-sensitive and the same query > should (quite reasonably) behave differently for different sessions and > users to sort out things like object name conflicts, etc. A function > should (ordinarily) be context-insensitive, especially when used in > something like an index expression or constraint. Having different > syntax helps separate those concepts. I'm not sure I follow that. When you say a function should be context-insensitive, you could also say, a function should be context-sensitive, but have a separate context. Which is kind of how it works now. Maybe not well enough. > 5. There's no way to prevent pg_temp from being included in the > search_path. This is separately fixable, but having the proposed SEARCH > syntax is likely to make for a better user experience in the common > cases. seems related to #3 > I'm open to suggestion about other ways to improve it, but SEARCH is > what I came up with. Some extensions of the current mechanism, like search_path = safe, search_path = session, search_path = inherit, etc. might work.
On Fri, 2023-08-18 at 14:25 +0200, Peter Eisentraut wrote: > > Not specifying SEARCH would have the same issue? Not specifying SEARCH is equivalent to SEARCH DEFAULT, and that gives us some control over what happens. In the proposed patch, a GUC determines whether it behaves like SEARCH SESSION (the default for compatibility reasons) or SEARCH SYSTEM (safer). > > 2. There's no way to explicitly request that you'd actually like to > > use > > the session's search_path, so it makes it very hard to ever change > > the > > default. > > That sounds like something that should be fixed independently. I > could > see this being useful for other GUC settings, like I want to run a > function explicitly with the session's work_mem. I'm confused about how this would work. It doesn't make sense to set a GUC to be the session value in postgresql.conf, because there's no session yet. And it doesn't really make sense in a top-level session, because it would just be a no-op (right?). It maybe makes sense in a function, but I'm still not totally clear on what that would mean. > > True, but is that specific to functions? Maybe I want a safe > search_path just in general, for a session or something. I agree this is a somewhat orthogonal problem and we should have a way to keep pg_temp out of the search_path entirely. We just need to agree on a string representation of a search path that omits pg_temp. One idea would be to have special identifiers "!pg_temp" and "!pg_catalog" that would cause those to be excluded entirely. > > I'm not sure I follow that. When you say a function should be > context-insensitive, you could also say, a function should be > context-sensitive, but have a separate context. Which is kind of how > it > works now. Maybe not well enough. For functions called from index expressions or constraints, you want the function's result to only depend on its arguments; otherwise you can easily violate a constraint or cause an index to return wrong results. You're right that there is some other context, like the database default collation, but (a) that's mostly nailed down; and (b) if it changes unexpectedly that also causes problems. > > I'm open to suggestion about other ways to improve it, but SEARCH > > is > > what I came up with. > > Some extensions of the current mechanism, like search_path = safe, > search_path = session, search_path = inherit, etc. might work. I had considered some new special names like this in search path, but I didn't come up with a specific proposal that I liked. Do you have some more details about how this would help get us to a safe default? Regards, Jeff Davis
Hi, On 2023-08-14 12:25:30 -0700, Jeff Davis wrote: > On Sat, 2023-08-12 at 11:25 -0700, Andres Freund wrote: > > > > I'm not sure that anything based, directly or indirectly, on > > search_path > > really is a realistic way to get there. > > Can you explain a little more? I see what you mean generally, that > search_path is an imprecise thing, and that it leaves room for > ambiguity and mistakes. It just doesn't seem to provide enough control and it's really painful for users to manage. If you install a bunch of extensions into public - very very common from what I have seen - you can't really remove public from the search path. Which then basically makes all approaches of resolving any of the security issues via search path pretty toothless. > > I think that'd be pretty painful from a UX perspective. Having to > > write > > e.g. operators as operator(schema, op) just sucks as an experience. > > I'm not suggesting that the user fully-qualify everything; I'm > suggesting that the include a "SET search_path" clause if they depend > on anything other than pg_catalog. I don't think that really works in practice, due to the very common practice of installing extensions into the same schema as the application. Then that schema needs to be in search path (if you don't want to schema qualify everything), which leaves you wide open. > > And with > > extensions plenty of operators will live outside of pg_catalog, so > > there is > > plenty things that will need qualifying. > > In my proposal, that would still involve a "SET search_path TO > myextension, pg_catalog, pg_temp". myextension is typically public. Which means that there's zero protection due to such a search path. > > And because of things like type > > coercion search, which prefers "bettering fitting" coercions over > > search path > > order, you can't just put "less important" things later in search > > path. > > I understand this introduces some ambiguity, but you just can't include > schemas in the search_path that you don't trust, for similar reasons as > $PATH. If you have a few objects you'd like to access in another user's > schema, fully-qualify them. I think the more common attack paths are things like tricking extension scripts into evaluating arbitrary code, to gain "real superuser" privileges. > > We can't just store the oids at the time, because that'd end up very > > fragile - > > tables/functions/... might be dropped and recreated etc and thus > > change their > > oid. > > Robert suggested something along those lines[1]. I won't rule it out, > but I agree that there are quite a few things left to figure out. > > > But we could change the core PLs to rewrite all the queries (*) so > > that > > they schema qualify absolutely everything, including operators and > > implicit > > type casts. > > So not quite like "SET search_path FROM CURRENT": you resolve it to a > specific "schemaname.objectname", but stop just short of resolving to a > specific OID? > > An interesting compromise, but I'm not sure what the benefit is vs. SET > search_path FROM CURRENT (or some defined search_path). Search path does not reliably protect things involving "type matching". If you have a better fitting cast, or a function call with parameters that won't need coercion, later in search path, they'll win, even if there's another fit earlier on. IOW, search path is a bandaid for this kind of thing, at best. If we instead store something that avoids the need for such search, the "better fitting cast" logic wouldn't add these kind of security issues anymore. > > That way objects referenced by functions can still be replaced, but > > search > > path can't be used to "inject" objects in different schemas. > > Obviously it > > could lead to errors on some schema changes - e.g. changing a column > > type > > might mean that a relevant cast lives in a different place than with > > the old > > type - but I think that'll be quite rare. Perhaps we could offer a > > ALTER > > FUNCTION ... REFRESH REFERENCES; or such? > > Hmm. I feel like that's making things more complicated. I'd find it > more straightforward to use something like Robert's approach of fully > parsing something, and then have the REFRESH command reparse it when > something needs updating. Or perhaps just create all of the dependency > entries more like a view query and then auto-refresh. Hm, I'm not quite sure I follow on what exactly you see as different here. Greetings, Andres Freund
On Wed, Aug 16, 2023 at 1:45 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote: > > On 12.08.23 04:35, Jeff Davis wrote: > > > The attached patch implements a new SEARCH clause for CREATE > > > FUNCTION. > > > The SEARCH clause controls the search_path used when executing > > > functions that were created without a SET clause. > > > > I don't understand this. This adds a new option for cases where the > > existing option wasn't specified. Why not specify the existing > > option > > then? Is it not good enough? Can we improve it? > > SET search_path = '...' not good enough in my opinion. > > 1. Not specifying a SET clause falls back to the session's search_path, > which is a bad default because it leads to all kinds of inconsistent > behavior and security concerns. > > 2. There's no way to explicitly request that you'd actually like to use > the session's search_path, so it makes it very hard to ever change the > default. > > 3. It's user-unfriendly. A safe search_path that would be suitable for > most functions is "SET search_path = pg_catalog, pg_temp", which is > arcane, and requires some explanation. > > 4. search_path for the session is conceptually different than for a > function. A session should be context-sensitive and the same query > should (quite reasonably) behave differently for different sessions and > users to sort out things like object name conflicts, etc. A function > should (ordinarily) be context-insensitive, especially when used in > something like an index expression or constraint. Having different > syntax helps separate those concepts. > > 5. There's no way to prevent pg_temp from being included in the > search_path. This is separately fixable, but having the proposed SEARCH > syntax is likely to make for a better user experience in the common > cases. > > I'm open to suggestion about other ways to improve it, but SEARCH is > what I came up with. The one thing that I really like about your proposal is that you explicitly included a way of specifying that the prevailing search_path should be used. If we move to any kind of a system where the default behavior is something other than that, then we need that as an option. Another, related thing that I recently discovered would be useful is a way to say "I'd like to switch the search_path to X, but I'd also like to discover what the prevailing search_path was just before entering this function." For example, if I have a function that is SECURITY DEFINER which takes some executable code as an input, I might want to arrange to eventually execute that code with the caller's user ID and search_path, but I can't discover the caller's search_path unless I don't set it, and that's a dangerous thing to do. However, my overall concern here is that this feels like it's reinventing the wheel. We already have a way of setting search_path; this gives us a second one. If we had no existing mechanism for that, I think this would definitely be an improvement, and quite possibly better than the current mechanism. But given that we had a mechanism already, if we added this, we'd then have two, which seems like the wrong number. I'm inclined to think that if there are semantics that we currently lack, we should think of extending the current syntax to support them. Right now you can SET search_path = 'specific value' or SET search_path FROM CURRENT or leave it out. We could introduce a new way of spelling "leave it out," like RESET search_path or whatever. We could introduce a new setting that doesn't set the search_path at all but reverts to the old value on function exit, like SET search_path USING CALL or whatever. And we could think of making SET search_path FROM CURRENT or any new semantics we introduce the default in a future release, or even make the default behavior depend on an evil behavior-changing GUC as you proposed. I'm not quite sure what we should do here conceptually, but I don't see why having a completely new syntax for doing it really helps. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, 2023-08-19 at 11:59 -0700, Andres Freund wrote: > If you install a bunch of extensions into public - very very > common from what I have seen - you can't really remove public from > the search > path. Which then basically makes all approaches of resolving any of > the > security issues via search path pretty toothless. Toothless only if (a) untrusted users have CREATE privileges in the public schema, which is no longer the default; and (b) you're writing a function that accesses extension objects installed in the public schema. While those may be normal things to do, there are a lot of times when those things aren't true. I speculate that it's far more common to write functions that only use pg_catalog objects (e.g. the "+" operator, some string manipulation, etc.) and basic control flow. There's a lot of value in making those simple cases secure-by-default. We are already moving users towards a readable-but-not-writable public schema as a best practice, so if we also move to something like SEARCH SYSTEM as a best practice, then that will help a LOT of users. > > > I don't think that really works in practice, due to the very common > practice > of installing extensions into the same schema as the application. > Then that > schema needs to be in search path (if you don't want to schema > qualify > everything), which leaves you wide open. ... > > > myextension is typically public. Which means that there's zero > protection due > to such a search path. You mentioned this three times so I must be missing something. Why is it "wide open" and "zero protection"? If the schema is not world- writable, then aren't attacks a lot harder to pull off? > > > I think the more common attack paths are things like tricking > extension > scripts into evaluating arbitrary code, to gain "real superuser" > privileges. Extension scripts are a separate beast. I do see some potential avenues of attack, but I don't see how your approach of resolving schemas early would help. > Search path does not reliably protect things involving "type > matching". If you > have a better fitting cast, or a function call with parameters that > won't need > coercion, later in search path, they'll win, even if there's another > fit > earlier on. You need to trust the schemas in your search_path. > If we instead store something that avoids the need for such search, > the > "better fitting cast" logic wouldn't add these kind of security > issues > anymore. I don't disagree, but I don't understand the approach in detail (i.e. I couldn't write it up as a proposal). For instance, what would the pg_dump output look like? And even if we had that in place, I think we'd still want a better way to control the search_path. > > > Hm, I'm not quite sure I follow on what exactly you see as different > here. From what I understand, Robert's approach is to fully parse the commands and resolve to specific OIDs (necessitating dependencies, etc.); while your approach resolves to fully-qualified names but not OIDs (and needing no dependencies). I don't understand either proposal entirely, so perhaps I'm on the wrong track here, but I feel like Robert's approach is more "normal" and easy to document whereas your approach is more "creative" and perhaps hard to document. Both approaches (resolving to names and resolving to OIDs) seem pretty far away, so I'm still very much inclined to nudge users toward safer best practices with search_path. I think SEARCH SYSTEM is a good start there and doable for 17. Regards, Jeff Davis
On Mon, 2023-08-21 at 15:14 -0400, Robert Haas wrote: > Another, related thing that I recently discovered would > be useful is a way to say "I'd like to switch the search_path to X, > but I'd also like to discover what the prevailing search_path was > just > before entering this function." Interesting, that could probably be accommodated one way or another. > However, my overall concern here is that this feels like it's > reinventing the wheel. We already have a way of setting search_path; > this gives us a second one. In one sense, you are obviously right. We have a way to set search_path for a function already, just like any other GUC. But I don't look at the search_path as "just another GUC" when it comes to executing a function. The source of the initial value of search_path is more like the IMMUTABLE marker. We can also do something with the knowledge the SEARCH marker gives us. For instance, issue WARNINGs or ERRORs when someone uses a SEARCH SESSION function in an index expression or constraint, or perhaps when they try to declare a function IMMUTABLE in the first place. In other words, the SEARCH clause tells us where search_path comes from, not so much what it is specifically. I believe that tells us something fundamental about the kind of function it is. If I tell you nothing about a function except whether the search path comes from the system or the session, you can imagine how it should be used (or not used, as the case may be). > I'm inclined to think that if there are semantics that we currently > lack, we should think of extending the current syntax to support > them. > Right now you can SET search_path = 'specific value' or SET > search_path FROM CURRENT or leave it out. We could introduce a new > way > of spelling "leave it out," like RESET search_path or whatever. The thought occurred to me but any way I looked at it was messier and less user-friendly. It feels like generalizing from search_path to all GUCs, and then needing to specialize for search_path anyway. For instance, if we want the default search_path to be the safe value 'pg_catalog, pg_temp', where would that default value come from? Or instead, we could say that the default would be FROM CURRENT, which would seem to generalize; but then we immediately run into the problem that we don't want most GUCs to default to FROM CURRENT (because that would capture the entire GUC state, which seems bad for several reasons), and again we'd need to specialize for search_path. In other words, search_path really *is* special. I don't think it's great to generalize from it as though it were just like every other GUC. I do recognize that "SEARCH SYSTEM ... SET search_path = '...'" is redundant, and that's not great. I just see the other options as worse, but if I've misunderstood your approach then please clarify. Regards, Jeff Davis
On Mon, Aug 21, 2023 at 5:32 PM Jeff Davis <pgsql@j-davis.com> wrote: > But I don't look at the search_path as "just another GUC" when it comes > to executing a function. The source of the initial value of search_path > is more like the IMMUTABLE marker. I mean I agree and I disagree. Philosophically, I agree. Most functions are written with some particular search_path in mind; the author imagines that the function will be executed with, well, probably whatever search path the author typically uses themselves. Now and then, someone may write a function that's intended to run with various different search paths, e.g. anything of the form customer_XXXX, pg_catalog, pg_temp. I think that is a real thing that people actually do, intentionally varying the search_path with the idea of rebinding some references. However, cases where somebody sincerely intends for the caller to be able to make + or || mean something different from normal probably do not exist in practice. So, if we were designing a system from scratch, then I would recommend against making search_path a GUC, because it's clearly shouldn't behave in the same way as a session property like debug_print_plan or enable_seqscan, where you could want to run the same code with various values. But practically, I disagree. As things stand today, search_path *is* a GUC that dynamically changes the run-time properties of a session, and your proposed patch wouldn't change that. What it would do is layer another mechanism on top of that which, IMHO, makes something that is already complicated and error-prone even more complicated. If we wanted to really make seach_path behave like a property of the code rather than the session, I think we'd need to change quite a bit more stuff, and the price of that in terms of backward-compatibility might be higher than we'd be willing to pay, but if, hypothetically, we decided to pay that price, then at the end of it search_path as a GUC would be gone, and we'd have one way of managing sarch_path that is different from the one we have now. But with the patch as you have proposed it that's not what happens. We just end up with two interconnected mechanisms for managing what, right now, is managed by a single mechanism. That mechanism is (and I think we probably mostly all agree on this) bad. Like really really bad. But having more than one mechanism, to me, still seems worse. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, 2023-09-18 at 12:01 -0400, Robert Haas wrote: > But with the patch as you have proposed it that's not what happens. > We > just end up with two interconnected mechanisms for managing what, > right now, is managed by a single mechanism. That mechanism is (and I > think we probably mostly all agree on this) bad. Like really really > bad. But having more than one mechanism, to me, still seems worse. I don't want to make an argument of the form "the status quo is really bad, and therefore my proposal is good". That line of argument is suspect for good reason. But if my proposal isn't good enough, and we don't have a clear alternative, we need to think seriously about how much we've collectively over-promised and under-delivered on the concept of privilege separation. Absent a better idea, we need to figure out a way to un-promise what we can't do and somehow guide users towards safe practices. For instance, don't grant the INSERT or UPDATE privilege if the table uses functions in index expressions or constraints. Also don't touch any table unless the onwer has SET ROLE privileges on your role already, or the operation is part of a special carve out (logical replication or a maintenance command). And don't use the predefined role pg_write_all_data, because that's unsafe for most imaginable use cases. Regards, Jeff Davis
On Mon, Sep 18, 2023 at 4:51 PM Jeff Davis <pgsql@j-davis.com> wrote: > I don't want to make an argument of the form "the status quo is really > bad, and therefore my proposal is good". That line of argument is > suspect for good reason. +1. > But if my proposal isn't good enough, and we don't have a clear > alternative, we need to think seriously about how much we've > collectively over-promised and under-delivered on the concept of > privilege separation. > > Absent a better idea, we need to figure out a way to un-promise what we > can't do and somehow guide users towards safe practices. For instance, > don't grant the INSERT or UPDATE privilege if the table uses functions > in index expressions or constraints. Also don't touch any table unless > the onwer has SET ROLE privileges on your role already, or the > operation is part of a special carve out (logical replication or a > maintenance command). And don't use the predefined role > pg_write_all_data, because that's unsafe for most imaginable use cases. I agree this is a mess, and that documenting the mess better would be good. But instead of saying not to do something, we need to say what will happen if you do the thing. I'm regularly annoyed when somebody reports that "I tried to do X and it didn't work," instead of saying what happened when they tried, and this situation is another form of the same thing. "If you do X, then Y will or can occur" is much better than "do not do X". And I think better documentation of this area would be useful regardless of any other improvements that we may or may not make. Indeed, really good documentation of this area might facilitate making further improvements by highlighting some of the problems so that they can more easily be understood by a wider audience. I fear it will be hard to come up with something that is clear, that highlights the severity of the problems, and that does not veer off into useless vitriol against the status quo, but if we can get there, that would be good. But, leaving that to one side, what technical options do we have on the table, supposing that we want to do something that is useful but not this exact thing? I think one option is to somehow change the behavior around search_path but in a different way than you've proposed. The most radical option would be to make it not be a GUC any more. I think the backward-compatibility implications of that would likely be unpalatable to many, and the details of what we'd actually do are also not clear, at least to me. For a function, I think there is a reasonable argument that you just make it a function property, like IMMUTABLE, as you said before. But for code that goes directly into the session, where's the search_path supposed to come from? It's got to be configured somewhere, and somehow that somewhere feels a lot like a GUC. That leads to a second idea, which is having it continue to be a GUC but only affect directly-entered SQL, with all indirectly-entered SQL either being stored as a node tree or having a search_path property attached somewhere. Or, as a third idea, suppose we leave it a GUC but start breaking semantics around where and how that GUC gets set, e.g. by changing CREATE FUNCTION to capture the prevailing search_path by default unless instructed otherwise. Personally I feel like we'd need pretty broad consensus for any of these kinds of changes because it would break a lot of stuff for a lot of people, but if we could get that then I think we could maybe emerge in a better spot once the pain of the compatibility break receded. Another option is something around sandboxing and/or function trust. The idea here is to not care too much about the search_path behavior itself, and instead focus on the consequences, namely what code is getting executed as which user and perhaps what kinds of operations it's performing. To me, this seems like a possibly easier answer way forward at least in the short to medium term, because I think it will break fewer things for fewer people, and if somebody doesn't like the new behavior they can just say "well I trust everyone completely" and it all goes back to the way it was. That said, I think there are problems with my previous proposals on the other thread so I believe some adjustments would be needed there, and then there's the problem of actually implementing anything. I'll try to respond to your comments on that thread soon. Are there any other categories of things we can do? More specific kinds of things we can do in each category? I don't really see an option other than (1) "change something in the system design so that people use search_path wrongly less often" or (2) "make it so that it doesn't matter as much if people using the wrong search_path" but maybe I'm missing a clever idea. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote: > I agree this is a mess, and that documenting the mess better would be > good. But instead of saying not to do something, we need to say what > will happen if you do the thing. I'm regularly annoyed when somebody > reports that "I tried to do X and it didn't work," instead of saying > what happened when they tried, and this situation is another form of > the same thing. "If you do X, then Y will or can occur" is much > better > than "do not do X". Good documentation includes some guidance. Sure, it should describe the system behavior, but without anchoring it to some kind of expected use case, it can be equally frustrating. Allow me to pick on this example which came up in a recent thread: "[password_required] Specifies whether connections to the publisher made as a result of this subscription must use password authentication. This setting is ignored when the subscription is owned by a superuser. The default is true. Only superusers can set this value to false." -- https://www.postgresql.org/docs/16/sql-createsubscription.html Only superusers can set it, and it's ignored for superusers. That does a good job of describing the actual behavior, but is a bit puzzling. I guess what the user is supposed to do is one of: 1. Create a subscription as a superuser with the right connection string (without a password) and password_required=false, then reassign it to a non-superuser; or 2. Create a subscription as a non-superuser member of pg_create_subscription using a bogus connection string, then a superuser can alter it to set password_required=false, then alter the connection string; or 3. Create a superuser, let the new superuser create a subscription with password_required=false, and then remove their superuser status. so why not just document one of those things as the expected thing to do? Not a whole section or anything, but a sentence to suggest what they should do or where else they should look. I don't mean to set some major new standard in the documentation that should apply to everything; but for the privilege system, even hackers are having trouble keeping up (myself included). A bit of guidance toward supported use cases helps a lot. > I fear it will be hard to come up with something that is > clear, that highlights the severity of the problems, and that does > not > veer off into useless vitriol against the status quo, but if we can > get there, that would be good. I hope what I'm saying is not useless vitriol. I am offering the best solutions I see in a bad situation. And I believe I've uncovered some emergent behaviors that are not well-understood even among prominent hackers. > That leads to a second idea, which is having it continue > to be a GUC but only affect directly-entered SQL, with all > indirectly-entered SQL either being stored as a node tree or having a > search_path property attached somewhere. That's not too far from the proposed patch and I'd certainly be interested to hear more and/or adapt my patch towards this idea. > Or, as a third idea, suppose > we leave it a GUC but start breaking semantics around where and how > that GUC gets set, e.g. by changing CREATE FUNCTION to capture the > prevailing search_path by default unless instructed otherwise. How would one instruct otherwise? > Personally I feel like we'd need pretty broad consensus for any of > these kinds of changes +1 > because it would break a lot of stuff for a lot > of people, but if we could get that then I think we could maybe > emerge > in a better spot once the pain of the compatibility break receded. Are there ways we can soften this a bit? I know compatibility GUCs are not to be added lightly, but perhaps one is justified here? > Another option is something around sandboxing and/or function trust. > The idea here is to not care too much about the search_path behavior > itself, and instead focus on the consequences, namely what code is > getting executed as which user and perhaps what kinds of operations > it's performing. I'm open to discussing that further, and it certainly may solve some problems, but it does not seem to solve the fundamental problem with search_path: that the caller can (intentionally or unintentionally) cause a function to do unexpected things. Sometimes an unexpected thing is not a the kind of thing that would be caught by a sandbox, e.g. just an unexpected function result. But if that function is used in a constraint or expression index, that unexpected result can lead to a violated constraint or a bad index (that will later cause wrong results). The owner of the table might reasonably consider that a privilege problem, if the user who causes the trouble had only INSERT privileges. > Are there any other categories of things we can do? More specific > kinds of things we can do in each category? I don't really see an > option other than (1) "change something in the system design so that > people use search_path wrongly less often" or (2) "make it so that it > doesn't matter as much if people using the wrong search_path" but > maybe I'm missing a clever idea. Perhaps there are some clever ideas about maintaining compatibility within the approaches (1) or (2), which might make one of them more appealing. Regards, Jeff Davis
On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote: >... > On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote: > > That leads to a second idea, which is having it continue > > to be a GUC but only affect directly-entered SQL, with all > > indirectly-entered SQL either being stored as a node tree or having a > > search_path property attached somewhere. > > That's not too far from the proposed patch and I'd certainly be > interested to hear more and/or adapt my patch towards this idea. As an interested bystander, that's the same thing I was thinking when reading this. I reread your original e-mail, Jeff, and I still think that. I wonder if something like CURRENT (i.e., the search path at function creation time) might be a useful keyword addition. I can see some uses (more forgiving than SYSTEM but not as loose as SESSION), but I don't know if it would justify its presence. Thanks for working on this. Thanks, Maciek
On Tue, Sep 19, 2023, 20:23 Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
I wonder if something like CURRENT (i.e., the search path at function
creation time) might be a useful keyword addition. I can see some uses
(more forgiving than SYSTEM but not as loose as SESSION), but I don't
know if it would justify its presence.
I realize now this is exactly what SET search_path FROM CURRENT does. Sorry for the noise.
Regarding extensions installed in the public schema throwing a wrench in the works, is that still a problem if the public schema is not writable? I know that that's a new default, but it won't be forever.
Hi
st 20. 9. 2023 v 9:34 odesílatel Maciek Sakrejda <m.sakrejda@gmail.com> napsal:
On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
>...
> On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
> > That leads to a second idea, which is having it continue
> > to be a GUC but only affect directly-entered SQL, with all
> > indirectly-entered SQL either being stored as a node tree or having a
> > search_path property attached somewhere.
>
> That's not too far from the proposed patch and I'd certainly be
> interested to hear more and/or adapt my patch towards this idea.
As an interested bystander, that's the same thing I was thinking when
reading this. I reread your original e-mail, Jeff, and I still think
that.
I wonder if something like CURRENT (i.e., the search path at function
creation time) might be a useful keyword addition. I can see some uses
(more forgiving than SYSTEM but not as loose as SESSION), but I don't
know if it would justify its presence.
Personally, I dislike this - because the value of the search path is hidden in this case.
I agree so it can be comfortable, but it can be confusing for review, migration, ...
Regards
Pavel
Thanks for working on this.
Thanks,
Maciek
On Tue, Sep 19, 2023 at 7:56 PM Jeff Davis <pgsql@j-davis.com> wrote: > Good documentation includes some guidance. Sure, it should describe the > system behavior, but without anchoring it to some kind of expected use > case, it can be equally frustrating. Fair. > I don't mean to set some major new standard in the documentation that > should apply to everything; but for the privilege system, even hackers > are having trouble keeping up (myself included). A bit of guidance > toward supported use cases helps a lot. Yeah, this stuff is complicated, and I agree that it's hard even for hackers to keep up with. I don't really have a strong view on the concrete case you mentioned involving password_required. I always worry that if there are three cases and we suggest one of them then the others will be viewed negatively when really they're all equally fine. On the other hand, that can often be addressed by starting the sentence with "For example, you could...." or similar, so perhaps there's no problem here at all. I generally agree with the idea that examples can be useful for clarifying points that may otherwise be too theoretical. > I hope what I'm saying is not useless vitriol. I am offering the best > solutions I see in a bad situation. And I believe I've uncovered some > emergent behaviors that are not well-understood even among prominent > hackers. Yeah, I wasn't really intending to say that you were. I just get nervous about statements like "don't ever do X!" because I find it completely unconvincing. In my experience, when you tell people stuff like that, some of them just go off and do it anyway and, especially in a case like this, chances are very good that nothing bad will ever happen to them, simply because most PostgreSQL installations don't have malicious local users. When you tell them "but that's really bad" they say "why?" and if the documentation doesn't have an answer to the question well then that sucks. > > because it would break a lot of stuff for a lot > > of people, but if we could get that then I think we could maybe > > emerge > > in a better spot once the pain of the compatibility break receded. > > Are there ways we can soften this a bit? I know compatibility GUCs are > not to be added lightly, but perhaps one is justified here? I don't know. I'm skeptical. This behavior is so complicated and hard to get right. Having it GUC-dependent makes it even more confusing than it is already. But I guess it also depends on what the GUC does. Let's say we make a rule that every function or procedure has to have a search_path attached to it as a function property. That is, CREATE FUNCTION .. SEARCH something sets pg_proc.prosearch = 'something'. If you omit the SEARCH clause, one is implicitly supplied for you. If you say SEARCH NULL, then the function is executed with the search_path taken from the GUC; SEARCH 'anything_else' specified a literal search_path to be used. In such a world, I can imagine having a GUC that determines whether the implicitly supplied SEARCH clause is SEARCH ${WHATEVER_THE_SEARCH_PATH_IS_RIGHT_NOW} or SEARCH NULL. Such a GUC only takes effect at CREATE FUNCTION time. However, I cannot imagine having a GUC that causes the SEARCH clauses attached to all functions to be globally ignored at execution time. That seems like a choice we would likely regret bitterly. The first thing is already painful, but the second one is exponentially worse, because in the first world, you have to be careful to get your functions defined correctly, but if you do, you know they'll run OK on any PostgreSQL cluster anywhere, whereas in the second world, there's no way to define a function that behaves the same way on every PostgreSQL instance. Imagine being an extension author, for example. I am a little worried that this kind of design might end up reversing the direction of some security problems that we have now. For instance, right now, if you call a function with a SET search_path clause, you know that it can't make any changes to search_path that survive function exit. You'll get your old search_path back. With this kind of design, it seems like it would be a lot easier to get back to the SQL toplevel and find the search_path surprisingly changed under you. I think we have that problem in some cases already, though. I'm unclear how much worse this makes it. > > Another option is something around sandboxing and/or function trust. > > The idea here is to not care too much about the search_path behavior > > itself, and instead focus on the consequences, namely what code is > > getting executed as which user and perhaps what kinds of operations > > it's performing. > > I'm open to discussing that further, and it certainly may solve some > problems, but it does not seem to solve the fundamental problem with > search_path: that the caller can (intentionally or unintentionally) > cause a function to do unexpected things. Well, I think it's meant to solve that problem. How effectively it does so is a point worth debating. > Sometimes an unexpected thing is not a the kind of thing that would be > caught by a sandbox, e.g. just an unexpected function result. But if > that function is used in a constraint or expression index, that > unexpected result can lead to a violated constraint or a bad index > (that will later cause wrong results). The owner of the table might > reasonably consider that a privilege problem, if the user who causes > the trouble had only INSERT privileges. That's an interesting example. Earlier versions of the function trust proposal proposed to block *any* execution of code belonging to an untrusted party. That could potentially block this attack. However, it would also block a lot of other things. For instance, if Alice tries to insert into Bob's table and Bob's table has a CHECK constraint or an index expression, Alice has to trust Bob or she can't insert anything at all. By trusting Bob just enough to allow him do things like CHECK(LENGTH(foo) < 10) or whatever, Alice can operate on Bob's table without a problem in normal cases, but is still protected if Bob suddenly starts doing something sneaky. I think that's a significant improvement, because a system that is so stringent that it blocks even completely harmless things is likely to get disabled, at which point it protects nobody from anything. However, that analysis presumes that what we're trying to do is protect Alice from Bob, and I think you're raising the question of how we protect Bob from Alice. Suppose Bob has got a trigger function but has failed to control search_path for that function. Alice can set search_path so that Bob's trigger calls some function or operator that she owns instead of the intended call to, say, a system function or operator. Some sufficiently-rigid function trust system could catch this: Bob doesn't trust Alice, and so the fact that his code is trying to call some a function or operator owned by Alice is a red flag. On the basis of the fact that Bob doesn't trust Alice, we should error out to protect Bob. Had the search_path been set in the expected way, Bob would have been trying to call a superuser-owned function, and Bob must trust the superuser, so the operation is permitted. I wouldn't have a problem with a function-trust proposal that incorporated a mode that rigid as a configuration option. I find this a convincing example of how that could be useful. But such a mode has pretty serious downsides, too. It makes it very difficult for one user to interact with another user's objects in any way without triggering security errors. Also, in a case like this, I don't think it's unreasonable to ask whether, perhaps, Bob just needs to be a little more careful about setting search_path. I think that there is a big difference between (a) defining a SQL-language function that is accessible to multiple users and (b) inserting a row into a table you don't own. When you define a function, you know people are potentially going to call it. Asking you, as the function author, to take some care to secure your function against a malicious search_path doesn't seem like an unsupportable burden. After all, you control the definition of that function. The problem with inserting a row into a table you don't own is that all of the objects involved -- the table itself, its indexes, its triggers, its defaults, its constraints -- are owned by somebody else, and that user controls those objects and can change any of them at any time. You can't really be expected to verify that all code reachable as a result of an INSERT into the table is safe enough before every INSERT into that table. You can, I think, be expected to check that functions you define have SET search_path attached. > > Are there any other categories of things we can do? More specific > > kinds of things we can do in each category? I don't really see an > > option other than (1) "change something in the system design so that > > people use search_path wrongly less often" or (2) "make it so that it > > doesn't matter as much if people using the wrong search_path" but > > maybe I'm missing a clever idea. > > Perhaps there are some clever ideas about maintaining compatibility > within the approaches (1) or (2), which might make one of them more > appealing. Indeed. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, 2023-09-19 at 20:23 -0700, Maciek Sakrejda wrote: > On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote: > > ... > > On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote: > > > That leads to a second idea, which is having it continue > > > to be a GUC but only affect directly-entered SQL, with all > > > indirectly-entered SQL either being stored as a node tree or > > > having a > > > search_path property attached somewhere. > > > > That's not too far from the proposed patch and I'd certainly be > > interested to hear more and/or adapt my patch towards this idea. > > As an interested bystander, that's the same thing I was thinking when > reading this. I reread your original e-mail, Jeff, and I still think > that. I have attached an updated patch. Changes: * Syntax is now: SEARCH FROM { DEFAULT | TRUSTED | SESSION } - added "FROM" to suggest that it's the source, and only a starting place, rather than a specific and final setting. I don't feel strongly about the FROM one way or another, so I can take it out if it's not helpful. - changed "SYSTEM" to "TRUSTED", which better reflects the purpose, and doesn't suggest any connection to ALTER SYSTEM. * Removed GUC -- we can reconsider this kind of thing later. * ERROR if IMMUTABLE is combined with SEARCH FROM SESSION * pg_dump support. Emits "SEARCH FROM SESSION" or "SEARCH FROM TRUSTED" only if explicitly specified; otherwise emits no SEARCH clause. Differentiating the unspecified cases may be useful for migration purposes later. * psql support. * Updated docs to try to better present the concept, and document CREATE PROCEDURE as well. The SEARCH clause declares a new property that will be useful to both enforce safety and also to guide users to migrate in a safe direction over time. For instance, the current patch prohibits the combination of IMMUTABLE and SEARCH FROM SESSION; but allows IMMUTABLE if no SEARCH clause is specified at all (to avoid breaking anything). We could extend that slowly over several releases ratchet up the pressure (with warnings or changing defaults) until all IMMUTABLE functions require SEARCH FROM TRUSTED. Perhaps IMMUTABLE would even imply SEARCH FROM TRUSTED. The search property is consistent with other properties, like IMMUTABLE, which is both a marker and also enforces some restrictions (e.g. you can't CREATE TABLE). It's also a lot nicer to use than a SET clause, and provides a nice place to document certain behaviors. (Aside: the concept of IMMUTABLE is basically broken today, due to search_path problems.) SEARCH FROM DEFAULT is just a way to get an object back to the "unspecified search clause" state. It has the same behavior as SEARCH FROM SESSION, except that the former will cause a hard error when combined with IMMUTABLE. I think it's worth differentiating the unspecified search clause from the explicit SEARCH FROM SESSION clause for the purposes of migration. There were three main complaints: Comaplaint A: That it creates a new mechanism[1]. The patch doesn't create a new internal mechanism, it almost entirely reuses the existing SET clause mechanism. I think complaint A is really about the user-facing mechanics, which is essentially the same as the complaint B. Complaint B: That it's overlapping in functionality with the SET clause[2][3]. In other words: CREATE FUNCTION ... SEARCH FROM TRUSTED ...; CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...; do similar things. But the latter is much worse: * it's user-unfriendly (requiring pg_temp is highly unintuitive) * it doesn't allow Postgres to warn if the function is being used in an unsafe context * there's no way to explicitly declare that you want the search path to come from the session instead (either to be more clear about your intentions, or to be forward-compatible) In my opinion, the "SET search_path = ..." form should be used when you actually want the search_path to contain some specific schema, not in cases where you're just using built-in objects. Complaint C: search_path is hopeless[4]. I think we can make meaningful improvements to the status quo, like with the attached patch, that will reduce a lot of the surface area for security risks. Right now our privilege model breaks down very quickly even with trivial and typical use cases and we can do better. -- Jeff Davis PostgreSQL Contributor Team - AWS [1] https://www.postgresql.org/message-id/CA%2BTgmoaRPJJN%3DAOqC4b9t90vFQX81hKiXNPNhbxR0-Sm8F8nCA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BTgmoah_bTjUFng-vZnivPQs0kQWUaSwAu49SU5M%2BzTxA%2B3Qw%40mail.gmail.com [3] https://www.postgresql.org/message-id/15464811-18fb-c7d4-4620-873366d367d6%40eisentraut.org [4] https://www.postgresql.org/message-id/20230812182559.d7plqwx3p65ys4i7%40awork3.anarazel.de
Вложения
On Thu, 2023-09-21 at 14:06 -0400, Robert Haas wrote: > Also, in a case like this, I don't think it's unreasonable to ask > whether, perhaps, Bob just needs to be a little more careful about > setting search_path. That's what this whole thread is about: I wish it was reasonable, but I don't think the tools we provide today make it reasonable. You expect Bob to do something like: CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ... for all functions, not just SECURITY DEFINER functions, is that right? Up until now, we've mostly treated search_path as a problem for SECURITY DEFINER, and specifying something like that might be reasonable for a small number of SECURITY DEFINER functions. But as my example showed, search_path is actually a problem for SECURITY INVOKER too: an index expression relies on the function producing the correct results, and it's hard to control that without controlling the search_path. > I think that there is a big difference between > (a) defining a SQL-language function that is accessible to multiple > users and (b) inserting a row into a table you don't own. When you > define a function, you know people are potentially going to call it. It's a bit problematic that (a) is the default: CREATE FUNCTION f(INT) RETURNS INT IMMUTABLE LANGUAGE plpgsql AS $$ BEGIN RETURN 42+$1; END; $$; CREATE TABLE x(i INT); CREATE INDEX x_idx ON x(f(i)); GRANT INSERT ON TABLE x TO u2; It's not obvious that f() is directly callable by u2 (though it is documented). I'm not disagreeing with the principle behind what you say above. My point is that "accessible to multiple users" is the ordinary default case, so there's no cue for the user that they need to do something special to secure function f(). > Asking you, as the function author, to take some care to secure your > function against a malicious search_path doesn't seem like an > unsupportable burden. What you are suggesting has been possible for quite some time. Do you think users are taking care to do this today? If not, how can we encourage them to do so? > You can, I think, be expected to > check that functions you define have SET search_path attached. We've already established that even postgres hackers are having difficulty keeping up with these nuances. Even though the SET clause has been there for a long time, our documentation on the subject is insufficient and misleading. And on top of that, it's extra typing and noise for every schema file. Until we make some changes I don't think we can expect users to do as you suggest. Regards, Jeff Davis
On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2023-09-21 at 14:06 -0400, Robert Haas wrote: > > Also, in a case like this, I don't think it's unreasonable to ask > > whether, perhaps, Bob just needs to be a little more careful about > > setting search_path. > > That's what this whole thread is about: I wish it was reasonable, but I > don't think the tools we provide today make it reasonable. You expect > Bob to do something like: > > CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ... > > for all functions, not just SECURITY DEFINER functions, is that right? Yes, I do. I think it's self-evident that a SQL function's behavior is not guaranteed to be invariant under all possible values of search_path. If you care about your function behaving the same way all the time, you have to set the search_path. TBH, I don't see any reasonable way around that requirement. We can perhaps provide some safeguards that will make it less likely that you will get completely hosed if your forget, and we could decide to make SET search_path or some mostly-equivalent thing the default at the price of pretty large compatibility break, but you can't have functions that both resolve object references using the caller's search path and also reliably do what the author intended. > > You can, I think, be expected to > > check that functions you define have SET search_path attached. > > We've already established that even postgres hackers are having > difficulty keeping up with these nuances. Even though the SET clause > has been there for a long time, our documentation on the subject is > insufficient and misleading. And on top of that, it's extra typing and > noise for every schema file. Until we make some changes I don't think > we can expect users to do as you suggest. Respectfully, I find this position unreasonable, to the point of finding it difficult to take seriously. You said in another part of your email that I didn't quote here that it's a problem that it's a problem that functions and procedures are created with public execute access by default -- but you can work around this by using a schema to which other users don't have access, or by changing the default permissions for functions on the schema where you are creating them, or by adjusting permissions on the individual objects. If you don't do any of that but don't trust the other users on your system then you at least need to set search_path. If you neither understand how function permissions work nor understand the importance of controlling search_path, you cannot expect to have a secure system with multiple, mutually untrusting users. That's just never going to work, regardless of what the server behavior is. I also disagree with the idea that setting the search_path should be regarded as noise. I think it's quite the opposite. I don't believe that people want to run their functions under a sanitized search_path that only includes system schemas. That might work for some people, but I think most people will define functions that call other functions that they themselves defined, or access tables that they themselves created. They will therefore need the search_path to include the schemas in which they created those objects. There's no way for the system to magically figure out what the user wants here. *Perhaps* if the function is defined interactively the then-current value could be captured, but in a pg_dump for example that won't work, and the configured value, wherever it came from initially, is going to have to be recorded so that it can be recreated when the dump is restored. Most of the problems that we're dealing with here have analogues in the world of shell scripts. A sql or plpgsql function is like a shell script. If it's setuid, i.e. SECURITY DEFINER, you have to worry about the caller hijacking it by setting PATH or IFS or LD_something. Even if it isn't, you have to either trust that the caller has set a reasonable PATH, or set one yourself, else your script isn't always going to work reliably. Nobody really expects to be able to make a setuid shell script secure at all -- that typically requires a wrapper executable -- but it definitely can't be done by someone who doesn't understand the importance of setting their PATH and has no idea how to use chmod. One thing that is quite different between the shell script situation and what we do inside PostgreSQL is that there's a lot more security by default. Every user gets a home directory which by default is accessible only to them, or at the very least writable only by them, and system directories have tightly-controlled permissions. I think UNIX had analogues of a lot of the problems we have today 40 years ago, but they've tightened things up. We've started to move in that direction by, for example, removing public execute access by default. If we want to move further in the direction that UNIX has taken, we should probably get rid of the public schema altogether, and auto-create per-user schemas with permissions that allow only that user to access them. But that's only making it easier to not accidentally have users accessing each other's stuff. The core problem that, if people do want to access each other's stuff, they either need to trust each other or really be on point with all the security-related stuff. That's equally true in the shell script case, and I think that problem is fundamental. It's just really not possible for people to call other people's code frequently without everyone involved either being super-careful about security or just not caring. -- Robert Haas EDB: http://www.enterprisedb.com
On 9/25/23 11:30, Robert Haas wrote: > I don't believe that people want to run their functions under a > sanitized search_path that only includes system schemas. That might > work for some people, but I think most people will define functions > that call other functions that they themselves defined, or access > tables that they themselves created. They will therefore need the > search_path to include the schemas in which they created those > objects. Without diving into all the detailed nuances of this discussion, this particular paragraph made me wonder if at least part of the problem here is that the same search_path is used to find "things that I want to execute" (functions and operators) and "things I want to access" (tables, etc). I think many folks would be well served by only having system schemas in the search_path for the former (augmented by explicit schema qualifying of one's own functions), but agree that almost no one wants that for the latter (needing to schema qualify every table reference). Should there be a way to have a separate "execution" search_path? -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Sep 25, 2023 at 12:00 PM Joe Conway <mail@joeconway.com> wrote: > Should there be a way to have a separate "execution" search_path? I have heard that idea proposed before, and I don't think it's a terrible idea, but I also don't think it fixes anything terribly fundamental. I think it's pretty normal for people to define functions and procedures and then call them from other functions and procedures, and if you do that, then you need that schema in your execution search path. Of course, if somebody doesn't do that, or schema-qualifies all such references, then this becomes useful for defense in depth. But I find it hard to see it as anything more than defense in depth because I think a lot of people will need to have use cases where they need to put non-system schemas into the execution search path, and such people wouldn't really benefit from the existence of this feature. Slightly off-topic, but I wonder whether, if we do this, we ought to do it by adding some kind of a marker to the existing search_path, rather than by creating a new GUC. For example, maybe putting & before a schema name means that it can be searched, but only for non-executable things. Then you could set search_path = &jconway, pg_catalog or something of that kind. It could potentially be more powerful to have it be a completely separate setting, but if we do that, everyone who currently needs to secure search_path properly starts needing to also secure execution_search_path properly. This is one of those cases where two is not better than one. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, 2023-09-25 at 11:30 -0400, Robert Haas wrote: > On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis <pgsql@j-davis.com> wrote: > > You expect > > Bob to do something like: > > > > CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ... > > > > for all functions, not just SECURITY DEFINER functions, is that > > right? > > Yes, I do. Do users like Bob do that today? If not, what causes you to expect them to do so in the future? > I think it's self-evident that a SQL function's behavior is > not guaranteed to be invariant under all possible values of > search_path. It's certainly not self-evident in a literal sense. I think you mean that it's "obvious" or something, and perhaps that narrow question is, but it's also not terribly helpful. If the important behaviors here were so obvious, how did we end up in this mess in the first place? > > We've already established that even postgres hackers are having > > difficulty keeping up with these nuances. Even though the SET > > clause > > has been there for a long time, our documentation on the subject is > > insufficient and misleading. And on top of that, it's extra typing > > and > > noise for every schema file. Until we make some changes I don't > > think > > we can expect users to do as you suggest. > > Respectfully, I find this position unreasonable, to the point of > finding it difficult to take seriously. Which part exactly is unreasonable? * Hackers are having trouble keeping up with the nuances. * Our documentation on the subject *is* insufficient and misleading. * "pg_temp" is noise. It seems like you think that users are already doing "SET search_path = pg_catalog, pg_temp" in all the necessary places, and therefore no change is required? > Most of the problems that we're dealing with here have analogues in > the world of shell scripts. I think analogies to unix are what caused a lot of the problems we have today, because postgres is a very different world. In unix-like environments, a file is just a file; in postgres, we have all kinds of code attached in interesting ways. Regards, Jeff Davis
On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote: > Should there be a way to have a separate "execution" search_path? I hadn't considered that and I like that idea for a few reasons: * a lot of the problem cases are for functions that don't need to access tables at all, e.g., in an index expression. * it avoids annoyances with pg_temp, because that's not searched for functions/operators anyway * perhaps we could force the object search_path to be empty for IMMUTABLE functions? I haven't thought it through in detail, but it seems like a promising approach. Regards, Jeff Davis
On 9/25/23 14:03, Jeff Davis wrote: > On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote: >> Should there be a way to have a separate "execution" search_path? > > I hadn't considered that and I like that idea for a few reasons: > > * a lot of the problem cases are for functions that don't need to > access tables at all, e.g., in an index expression. > * it avoids annoyances with pg_temp, because that's not searched for > functions/operators anyway > * perhaps we could force the object search_path to be empty for > IMMUTABLE functions? > > I haven't thought it through in detail, but it seems like a promising > approach. Related to this, it would be useful if you could grant create on schema for only non-executable objects. You may want to allow a user to create their own tables but not allow them to create their own functions, for example. Right now "GRANT CREATE ON SCHEMA foo" gives the grantee the ability to create "all the things". -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Sep 25, 2023 at 1:56 PM Jeff Davis <pgsql@j-davis.com> wrote: > Do users like Bob do that today? If not, what causes you to expect them > to do so in the future? What I would say is that if there's a reasonable way of securing your stuff and you don't make use of it, that's your problem. If securing your stuff is unreasonably difficult, that's a product problem. I think that setting the search_path on your own functions is a basic precaution that you should take if you are worried about multi-user security. I do not believe it is realistic to eliminate that requirement, and if people like Bob don't do that today and can't be made to do that in the future, then I think it's just hopeless. In contrast, I think that the precautions that you need to take when doing anything to a table owned by another user are unreasonably complex and not very realistic for anyone to take on a routine basis. Even if you validate that there's nothing malicious before you access the table, the table owner can change that at any time, so it's very hard to reliably protect yourself. In terms of whether people like Bob actually DO do that today, I'd say probably some do and others don't. I think that the overwhelming majority of PostgreSQL users simply aren't concerned about multi-user security. They either have a single user account that is used for everything, or say one account for the application and another for interactive access, or they have a bunch of users but basically all of those people are freely accessing each other's stuff and they're not really concerned with firewalling them from each other. Among the small percentage of users who are really concerned with making sure that users can't get into each others accounts, I would expect that knowing that you need to control search_path is fairly common, but it's hard to say. I haven't actually met many such users. > > I think it's self-evident that a SQL function's behavior is > > not guaranteed to be invariant under all possible values of > > search_path. > > It's certainly not self-evident in a literal sense. I think you mean > that it's "obvious" or something, and perhaps that narrow question is, > but it's also not terribly helpful. > > If the important behaviors here were so obvious, how did we end up in > this mess in the first place? I feel like this isn't really responsive to the argument that I was and am making, and I'm worried that we're going down a rat-hole here. I wondered after reading this whether I had misused the term self-evident, but when I did a Google search for "self-evident" the definition that comes up is "not needing to be demonstrated or explained; obvious." I am not saying that everyone is going to realize that you probably ought to be setting search_path on all of your functions in any kind of multi-user environment, and maybe even in a single-user environment just to avoid weird failures if you ever change your default search_path. What I am saying is that if you stop to think about what search_path does while looking at any SQL function you've ever written, you should probably realize pretty quickly that the behavior of your function in search_path-dependent, and indeed that the behavior of every other SQL function you've ever written is probably search_path-dependent, too. I think the problem here isn't really that this is hard to understand, but that many people have not stopped to think about it. Maybe it is obvious to you what we ought to do about that, but it is not obvious to me. As I have said, I think that changing the behavior of CREATE FUNCTION or CREATE PROCEDURE so that some search_path control is the default is worth considering. However, I think that such a change inevitably breaks backward compatibility, and I don't think we have enough people weighing in on this thread to think that we can just go do that even if everyone agreed on precisely what was to be done, and I think it is pretty clear that we do not have unanimous agreement. > > Respectfully, I find this position unreasonable, to the point of > > finding it difficult to take seriously. > > Which part exactly is unreasonable? I interpreted you to be saying that we can't expect people to set search_path on their functions. And I just don't buy that. We have made mistakes in that area in PostgreSQL itself and had to fix them later, and we may make more mistakes again in the future, so if you think we need better documentation or better defaults, I think you might be right. But if you think it's a crazy idea for people running PostgreSQL in multi-user environments to understand that setting search_path on all of their functions and procedures is essential, I disagree. They've got to understand that, because it's not that complicated, and there's no real alternative. > I think analogies to unix are what caused a lot of the problems we have > today, because postgres is a very different world. In unix-like > environments, a file is just a file; in postgres, we have all kinds of > code attached in interesting ways. Yeah. That's another area where I find it very unclear how to do better. From a security point of view, I think that the fact that there are so many interesting places to attach code is completely insane. It makes running a secure multi-user environment very difficult, bordering on impossible. But is there any way we can really fix that without just removing a whole bunch of functionality? I think that some of the ideas that have been proposed here could help, but I'm extremely doubtful that they or anything else are a complete solution, and I'm pretty sure that there is no "easy button" here -- given the number of "interesting" ways to execute code, I think security will always be tough to get right, regardless of what we change. My emails on this thread seem to have made you frustrated. For that, I am sorry. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, 2023-09-21 at 14:33 -0700, Jeff Davis wrote: > I have attached an updated patch. Changes: Withdrawing this from CF due to lack of consensus. I'm happy to resume this discussion if someone sees a path forward to make it easier to secure the search_path; or at least help warn users when a function without a secured search_path is being used unsafely. Regards, Jeff Davis
On Mon, 2023-09-25 at 11:30 -0400, Robert Haas wrote: > > That's what this whole thread is about: I wish it was reasonable, > > but I > > don't think the tools we provide today make it reasonable. You > > expect > > Bob to do something like: > > > > CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ... > > > > for all functions, not just SECURITY DEFINER functions, is that > > right? > > Yes, I do. I think it's self-evident that a SQL function's behavior > is > not guaranteed to be invariant under all possible values of > search_path. If you care about your function behaving the same way > all > the time, you have to set the search_path. After adding the search path cache (recent commit f26c2368dc) hopefully that helps to make the above suggestion more reasonable performance- wise. I think we can call that progress. Regards, Jeff Davis
On Tue, Nov 14, 2023 at 11:21 PM Jeff Davis <pgsql@j-davis.com> wrote: > After adding the search path cache (recent commit f26c2368dc) hopefully > that helps to make the above suggestion more reasonable performance- > wise. I think we can call that progress. I agree. Not to burden you, but do you know what the overhead is now, and do you have any plans to further reduce it? I don't believe that's the only thing we ought to be doing here, necessarily, but it is one thing that we definitely should be doing and probably the least controversial. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, 2023-11-20 at 15:52 -0500, Robert Haas wrote: > I agree. Not to burden you, but do you know what the overhead is now, > and do you have any plans to further reduce it? I don't believe > that's > the only thing we ought to be doing here, necessarily, but it is one > thing that we definitely should be doing and probably the least > controversial. Running the simple test described here: https://www.postgresql.org/message-id/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com The baseline (no "SET search_path" clause on the function) is around 3800ms, and with the clause it shoots up to 8000ms. That's not good, but it is down from about 12000ms before. There are a few patches in the queue to bring it down further. Andres and I were discussing some GUC hashtable optimizations here: https://www.postgresql.org/message-id/27a7a289d5b8f42e1b1e79b1bcaeef3a40583bd2.camel@j-davis.com which will (if committed) bring it down into the mid 7s. There are also a couple other patches I have here (and intend to commit soon): https://www.postgresql.org/message-id/e6fded24cb8a2c53d4ef069d9f69cc7baaafe9ef.camel%40j-davis.com and those I think will get it into the mid 6s. I think a bit lower combined with the GUC hash table optimizations above. So we are still looking at around 50% overhead for a simple function if all this stuff gets committed. Not great, but a lot better than before. Of course I welcome others to profile and see what they can do. There's a setjmp() call, and a couple allocations, and maybe some other stuff to look at. There are also higher-level ideas, like avoiding calling into guc.c in some cases, but that starts to get tricky as you pointed out: https://www.postgresql.org/message-id/CA%2BTgmoa8uKQgak5wH0%3D7sL-ukqbwnCPMXA2ZW7Ccdt7tdNGkzg%40mail.gmail.com It seems others are also interested in this problem, so I can put some more effort in after this round of patches goes in. I don't have a specific target other than "low enough overhead that we can reasonably recommend it as a best practice in multi-user environments". Regards, Jeff Davis
On Mon, Nov 20, 2023 at 5:27 PM Jeff Davis <pgsql@j-davis.com> wrote: > Of course I welcome others to profile and see what they can do. There's > a setjmp() call, and a couple allocations, and maybe some other stuff > to look at. There are also higher-level ideas, like avoiding calling > into guc.c in some cases, but that starts to get tricky as you pointed > out: > > https://www.postgresql.org/message-id/CA%2BTgmoa8uKQgak5wH0%3D7sL-ukqbwnCPMXA2ZW7Ccdt7tdNGkzg%40mail.gmail.com > > It seems others are also interested in this problem, so I can put some > more effort in after this round of patches goes in. I don't have a > specific target other than "low enough overhead that we can reasonably > recommend it as a best practice in multi-user environments". The two things that jump out at me are the setjmp() and the hash_search() call inside find_option(). As to the first, could we remove the setjmp() and instead have abort-time processing know something about this? For example, imagine we just push something onto a stack like we do for ErrorContextCallback, do whatever, and then pop it off. But if an error is thrown then the abort path knows to look at that variable and do whatever. As to the second, could we somehow come up with an API for guc.c where you can ask for an opaque handle that will later allow you to do a fast-SET of a GUC? The opaque handle would basically be the hashtable entry, perhaps with some kind of wrapping or decoration. Then fmgr_security_definer() could obtain the opaque handles and cache them in fn_extra. (I'm just spitballing here.) -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, 2023-11-21 at 09:24 -0500, Robert Haas wrote: > As to the second, could we somehow come > up with an API for guc.c where you can ask for an opaque handle that > will later allow you to do a fast-SET of a GUC? Yes, attached. That provides a significant speedup: my test goes from around ~7300ms to ~6800ms. This doesn't seem very controversial or complex, so I'll probably commit this soon unless someone else has a comment. -- Jeff Davis PostgreSQL Contributor Team - AWS
Вложения
On Tue, Dec 5, 2023 at 7:55 AM Jeff Davis <pgsql@j-davis.com> wrote: > > On Tue, 2023-11-21 at 09:24 -0500, Robert Haas wrote: > > As to the second, could we somehow come > > up with an API for guc.c where you can ask for an opaque handle that > > will later allow you to do a fast-SET of a GUC? > > Yes, attached. That provides a significant speedup: my test goes from > around ~7300ms to ~6800ms. > > This doesn't seem very controversial or complex, so I'll probably > commit this soon unless someone else has a comment. + * set_config_option_ext: sets option with the given handle to the given + * value. Copy-paste-o of the other function name. +config_handle * +get_config_handle(const char *name) +{ + struct config_generic *record; + + record = find_option(name, false, false, 0); + if (record == NULL) + return 0; Part of this code this was copied from a function that returned int, but this one returns a pointer.
On Tue, 2023-12-05 at 23:22 +0700, John Naylor wrote: > Copy-paste-o of the other function name. ... > Part of this code this was copied from a function that returned int, > but this one returns a pointer. Thank you, fixed. Also, I forward-declared config_generic in guc.h to eliminate the cast. Regards, Jeff Davis
Вложения
On Tue, 2023-11-21 at 09:24 -0500, Robert Haas wrote: > As to the first, could we > remove the setjmp() and instead have abort-time processing know > something about this? For example, imagine we just push something > onto > a stack like we do for ErrorContextCallback, do whatever, and then > pop > it off. But if an error is thrown then the abort path knows to look > at > that variable and do whatever. If I remove the TRY/CATCH entirely, it shows there's room for ~200ms improvement in my test. I attached a rough patch, which doesn't quite achieve that much, it's more like ~100ms improvement and starts to fall within the noise. So perhaps an improvement, but a bit disappointing. It's not a lot of code, but it's not trivial either because the nesting level needs to be tracked (so a subxact abort doesn't reset too much state). Also, it's not quite as clean as it could be, because I went to some effort to avoid an alloc/free by keeping the stack within the fcache. I didn't pay a lot of attention to correctness in this particular patch; I was mostly trying a few different formulations for performance measurement. I'm not inclined to commit this in its current form but if someone thinks that it's a worthwhile direction, I can clean it up a bit and reconsider. Regards, Jeff Davis
Вложения
On Tue, 2023-12-05 at 11:58 -0800, Jeff Davis wrote: > Also, I forward-declared config_generic in guc.h to eliminate the > cast. Looking more closely, I fixed an issue related to placeholder configs. We can't return a handle to a placeholder, because it's not stable, so in that case it falls back to using the name. My apologies for the churn on this (mostly) simple patch. I think this version is correct; I intend to commit soon. Regards, Jeff Davis