Обсуждение: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

Поиск
Список
Период
Сортировка

CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
The attached patch implements a new SEARCH clause for CREATE FUNCTION.
The SEARCH clause controls the search_path used when executing
functions that were created without a SET clause.

Background:

Controlling search_path is critical for the correctness and security of
functions. Right now, the author of a function without a SET clause has
little ability to control the function's behavior, because even basic
operators like "+" involve search_path. This is a big problem for, e.g.
functions used in expression indexes which are called by any user with
write privileges on the table.

Motivation:

I'd like to (eventually) get to safe-by-default behavior. In other
words, the simplest function declaration should be safe for the most
common use cases.

To get there, we need some way to explicitly specify the less common
cases. Right now there's no way for the function author to indicate
that a function intends to use the session's search path. We also need
an easier way to specify that the user wants a safe search_path ("SET
search_path = pg_catalog, pg_temp" is arcane).

And when we know more about the user's actual intent, then it will be
easier to either form a transition plan to push users into the safer
behavior, or at least warn strongly when the user is doing something
dangerous (i.e. using a function that depends on the session's search
path as part of an expression index).

Today, the only information we have about the user's intent is the
presence or absence of a "SET search_path" clause, which is not a
strong signal.

Proposal:

Add SEARCH { DEFAULT | SYSTEM | SESSION } clause to CREATE/ALTER
function.

  * SEARCH DEFAULT is the same as no SEARCH clause at all, and ends up
stored in the catalog as prosearch='d'.
  * SEARCH SYSTEM means that we switch to the safe search path of
"pg_catalog, pg_temp"  when executing the function. Stored as
prosearch='y'.
  * SEARCH SESSION means that we don't switch the search_path when
executing the function, and it's inherited from the session. Stored as
prosearch='e'.

Regardless of the SEARCH clause, a "SET search_path" clause will
override it. The SEARCH clause only matters when "SET search_path" is
not there.

Additionally provide a GUC, defaulting to false for compatibility, that
can interpret prosearch='d' as if it were prosearch='y'. It could help
provide a transition path. I know there's a strong reluctance to adding
these kinds of GUCs; I can remove it and I think the patch will still
be worthwhile. Perhaps there are alternatives that could help with
migration at pg_dump time instead?

Benefits:

1. The user can be more explicit about their actual intent. Do they
want safety and consistency? Or the flexibility of using the session's
search_path?

2. We can more accurately serve the user's intent. For instance, the
safe search_path of "pg_catalog, pg_temp" is arcane and seems to be
there just because we don't have a way to specify that pg_temp be
excluded entirely. But perhaps in the future we *do* want to exclude
pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM"
allows us some freedom to do that.

3. Users can be forward-compatible by specifying the functions that
really do need to use the session's search path as SEARCH SESSION, so
that they will never be broken in the future. That gives us a cleaner
path toward making the default behavior safe.


--
Jeff Davis
PostgreSQL Contributor Team - AWS



Вложения

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Joe Conway
Дата:
On 8/11/23 22:35, Jeff Davis wrote:
> The attached patch implements a new SEARCH clause for CREATE FUNCTION.
> The SEARCH clause controls the search_path used when executing
> functions that were created without a SET clause.
> 
> Background:
> 
> Controlling search_path is critical for the correctness and security of
> functions. Right now, the author of a function without a SET clause has
> little ability to control the function's behavior, because even basic
> operators like "+" involve search_path. This is a big problem for, e.g.
> functions used in expression indexes which are called by any user with
> write privileges on the table.
> 
> Motivation:
> 
> I'd like to (eventually) get to safe-by-default behavior. In other
> words, the simplest function declaration should be safe for the most
> common use cases.

I agree with the general need.

> Add SEARCH { DEFAULT | SYSTEM | SESSION } clause to CREATE/ALTER
> function.
> 
>    * SEARCH DEFAULT is the same as no SEARCH clause at all, and ends up
> stored in the catalog as prosearch='d'.
>    * SEARCH SYSTEM means that we switch to the safe search path of
> "pg_catalog, pg_temp"  when executing the function. Stored as
> prosearch='y'.
>    * SEARCH SESSION means that we don't switch the search_path when
> executing the function, and it's inherited from the session. Stored as
> prosearch='e'.

It isn't clear to me what is the precise difference between  DEFAULT and 
SESSION


> 2. We can more accurately serve the user's intent. For instance, the
> safe search_path of "pg_catalog, pg_temp" is arcane and seems to be
> there just because we don't have a way to specify that pg_temp be
> excluded entirely. But perhaps in the future we *do* want to exclude
> pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM"
> allows us some freedom to do that.

Personally I think having pg_temp in the SYSTEM search path makes sense 
for temp tables, but I find it easy to forget that functions can be 
created by unprivileged users in pg_temp, and therefore having pg_temp 
in the search path for functions is dangerous.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Joe Conway
Дата:
On 8/12/23 09:15, Joe Conway wrote:
> On 8/11/23 22:35, Jeff Davis wrote:
>> 2. We can more accurately serve the user's intent. For instance, the
>> safe search_path of "pg_catalog, pg_temp" is arcane and seems to be
>> there just because we don't have a way to specify that pg_temp be
>> excluded entirely. But perhaps in the future we *do* want to exclude
>> pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM"
>> allows us some freedom to do that.
> 
> Personally I think having pg_temp in the SYSTEM search path makes sense
> for temp tables, but I find it easy to forget that functions can be
> created by unprivileged users in pg_temp, and therefore having pg_temp
> in the search path for functions is dangerous.

Hmm, I guess I was too hasty -- seems we have some magic related to this 
already.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Sat, 2023-08-12 at 09:15 -0400, Joe Conway wrote:
> It isn't clear to me what is the precise difference between  DEFAULT
> and
> SESSION

The the current patch, SESSION always gets the search path from the
session, while DEFAULT is controlled by the GUC
safe_function_search_path. If the GUC is false (the default) then
DEFAULT and SESSION are the same. If the GUC is true, then DEFAULT and
SYSTEM are the same.

There are alternatives to using a GUC to differentiate them. The main
point of this patch is to capture what the user intends in a convenient
and forward-compatible way. If the user specifies nothing at all, they
get DEFAULT, and we could treat that specially in various ways to move
toward safety while minimizing breakage.

>
> Personally I think having pg_temp in the SYSTEM search path makes
> sense
> for temp tables

The patch doesn't change this behavior -- SYSTEM (without any other
SET) gives you "pg_catalog, pg_temp" and there's no way to exclude
pg_temp entirely.

My point was that by capturing the user's intent with SEARCH SYSTEM, it
gives us a bit more freedom to have these kinds of discussions later.
And it's certainly easier for the user to specify SEARCH SYSTEM than
"SET search_path = pg_catalog, pg_temp".

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Sat, 2023-08-12 at 09:50 -0400, Joe Conway wrote:
> Hmm, I guess I was too hasty -- seems we have some magic related to
> this
> already.

I was worried after your first email. But yes, the magic is in
FuncnameGetCandidates(), which simply ignores functions in the temp
namespace.

It would be better if we were obviously safe rather than magically
safe, though.

Regards,
    Jeff Davis






Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Andres Freund
Дата:
Hi,

On 2023-08-11 19:35:22 -0700, Jeff Davis wrote:
> Controlling search_path is critical for the correctness and security of
> functions. Right now, the author of a function without a SET clause has
> little ability to control the function's behavior, because even basic
> operators like "+" involve search_path. This is a big problem for, e.g.
> functions used in expression indexes which are called by any user with
> write privileges on the table.

> Motivation:
>
> I'd like to (eventually) get to safe-by-default behavior. In other
> words, the simplest function declaration should be safe for the most
> common use cases.

I'm not sure that anything based, directly or indirectly, on search_path
really is a realistic way to get there.


> To get there, we need some way to explicitly specify the less common
> cases. Right now there's no way for the function author to indicate
> that a function intends to use the session's search path. We also need
> an easier way to specify that the user wants a safe search_path ("SET
> search_path = pg_catalog, pg_temp" is arcane).

No disagreement with that. Even if I don't yet agree that your proposal is a
convincing path to "easy security for PLs" - just making the search path stuff
less arcane is good.


> And when we know more about the user's actual intent, then it will be
> easier to either form a transition plan to push users into the safer
> behavior, or at least warn strongly when the user is doing something
> dangerous (i.e. using a function that depends on the session's search
> path as part of an expression index).

I think that'd be pretty painful from a UX perspective. Having to write
e.g. operators as operator(schema, op) just sucks as an experience. And with
extensions plenty of operators will live outside of pg_catalog, so there is
plenty things that will need qualifying.  And because of things like type
coercion search, which prefers "bettering fitting" coercions over search path
order, you can't just put "less important" things later in search path.


I wonder if we ought to work more on "fossilizing" the result of search path
resolutions at the time functions are created, rather than requiring the user
to do so explicitly.  Most of the problem here comes down to the fact that if
a user creates a function like 'a + b' we'll not resolve the operator, the
potential type coercions etc, when the function is created - we do so when the
function is executed.

We can't just store the oids at the time, because that'd end up very fragile -
tables/functions/... might be dropped and recreated etc and thus change their
oid. But we could change the core PLs to rewrite all the queries (*) so that
they schema qualify absolutely everything, including operators and implicit
type casts.

That way objects referenced by functions can still be replaced, but search
path can't be used to "inject" objects in different schemas. Obviously it
could lead to errors on some schema changes - e.g. changing a column type
might mean that a relevant cast lives in a different place than with the old
type - but I think that'll be quite rare. Perhaps we could offer a ALTER
FUNCTION ... REFRESH REFERENCES; or such?

One obvious downside of such an approach is that it requires some work with
each PL. I'm not sure that's avoidable - and I suspect that most "security
sensitive" functions are written in just a few languages.


(*) Obviously the one thing that doesn't work for is use of EXECUTE in plpgsql
and similar constructs elsewhere. I'm not sure there's much that can be done
to make that safe, but it's worth thinking about more.

Greetings,

Andres Freund



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Sat, 2023-08-12 at 11:25 -0700, Andres Freund wrote:
>
> I'm not sure that anything based, directly or indirectly, on
> search_path
> really is a realistic way to get there.

Can you explain a little more? I see what you mean generally, that
search_path is an imprecise thing, and that it leaves room for
ambiguity and mistakes.

But I also think we can do a lot better than we're doing today and
still retain the basic concept of search_path, which is good because
it's deeply integrated into postgres, and it's not clear that we're
going to get away from it any time soon.

>
>
> I think that'd be pretty painful from a UX perspective. Having to
> write
> e.g. operators as operator(schema, op) just sucks as an experience.

I'm not suggesting that the user fully-qualify everything; I'm
suggesting that the include a "SET search_path" clause if they depend
on anything other than pg_catalog.

> And with
> extensions plenty of operators will live outside of pg_catalog, so
> there is
> plenty things that will need qualifying.

In my proposal, that would still involve a "SET search_path TO
myextension, pg_catalog, pg_temp".

The main reason that's bad is that adding pg_temp at the end is painful
UX -- just something that the user needs to remember to do with little
obvious reason or observable impact; but it has important security
implications. Perhaps we should just not implicitly include pg_temp for
a function's search_path (at least for the case of CREATE FUNCTION ...
SEARCH SYSTEM)?

>   And because of things like type
> coercion search, which prefers "bettering fitting" coercions over
> search path
> order, you can't just put "less important" things later in search
> path.

I understand this introduces some ambiguity, but you just can't include
schemas in the search_path that you don't trust, for similar reasons as
$PATH. If you have a few objects you'd like to access in another user's
schema, fully-qualify them.

> We can't just store the oids at the time, because that'd end up very
> fragile -
> tables/functions/... might be dropped and recreated etc and thus
> change their
> oid.

Robert suggested something along those lines[1]. I won't rule it out,
but I agree that there are quite a few things left to figure out.

>  But we could change the core PLs to rewrite all the queries (*) so
> that
> they schema qualify absolutely everything, including operators and
> implicit
> type casts.

So not quite like "SET search_path FROM CURRENT": you resolve it to a
specific "schemaname.objectname", but stop just short of resolving to a
specific OID?

An interesting compromise, but I'm not sure what the benefit is vs. SET
search_path FROM CURRENT (or some defined search_path).

> That way objects referenced by functions can still be replaced, but
> search
> path can't be used to "inject" objects in different schemas.
> Obviously it
> could lead to errors on some schema changes - e.g. changing a column
> type
> might mean that a relevant cast lives in a different place than with
> the old
> type - but I think that'll be quite rare. Perhaps we could offer a
> ALTER
> FUNCTION ... REFRESH REFERENCES; or such?

Hmm. I feel like that's making things more complicated. I'd find it
more straightforward to use something like Robert's approach of fully
parsing something, and then have the REFRESH command reparse it when
something needs updating. Or perhaps just create all of the dependency
entries more like a view query and then auto-refresh.

> (*) Obviously the one thing that doesn't work for is use of EXECUTE
> in plpgsql
> and similar constructs elsewhere. I'm not sure there's much that can
> be done
> to make that safe, but it's worth thinking about more.

I think it would be really nice to have some better control over the
search_path regardless, because it still helps with cases like this. A
lot of C functions build queries, and I don't think it's reasonable to
constantly worry about the ambiguity of the schema for "=".

Regards,
    Jeff Davis

[1]
https://www.postgresql.org/message-id/CA%2BTgmobd%3DeFRGWHhfG4mG2cA%2BdsVuA4jpBvD8N1NS%3DVc9eHFQg%40mail.gmail.com




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Peter Eisentraut
Дата:
On 12.08.23 04:35, Jeff Davis wrote:
> The attached patch implements a new SEARCH clause for CREATE FUNCTION.
> The SEARCH clause controls the search_path used when executing
> functions that were created without a SET clause.

I don't understand this.  This adds a new option for cases where the 
existing option wasn't specified.  Why not specify the existing option 
then?  Is it not good enough?  Can we improve it?




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
> On 12.08.23 04:35, Jeff Davis wrote:
> > The attached patch implements a new SEARCH clause for CREATE
> > FUNCTION.
> > The SEARCH clause controls the search_path used when executing
> > functions that were created without a SET clause.
>
> I don't understand this.  This adds a new option for cases where the
> existing option wasn't specified.  Why not specify the existing
> option
> then?  Is it not good enough?  Can we improve it?

SET search_path = '...' not good enough in my opinion.

1. Not specifying a SET clause falls back to the session's search_path,
which is a bad default because it leads to all kinds of inconsistent
behavior and security concerns.

2. There's no way to explicitly request that you'd actually like to use
the session's search_path, so it makes it very hard to ever change the
default.

3. It's user-unfriendly. A safe search_path that would be suitable for
most functions is "SET search_path = pg_catalog, pg_temp", which is
arcane, and requires some explanation.

4. search_path for the session is conceptually different than for a
function. A session should be context-sensitive and the same query
should (quite reasonably) behave differently for different sessions and
users to sort out things like object name conflicts, etc. A function
should (ordinarily) be context-insensitive, especially when used in
something like an index expression or constraint. Having different
syntax helps separate those concepts.

5. There's no way to prevent pg_temp from being included in the
search_path. This is separately fixable, but having the proposed SEARCH
syntax is likely to make for a better user experience in the common
cases.

I'm open to suggestion about other ways to improve it, but SEARCH is
what I came up with.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Peter Eisentraut
Дата:
On 16.08.23 19:44, Jeff Davis wrote:
> On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
>> On 12.08.23 04:35, Jeff Davis wrote:
>>> The attached patch implements a new SEARCH clause for CREATE
>>> FUNCTION.
>>> The SEARCH clause controls the search_path used when executing
>>> functions that were created without a SET clause.
>>
>> I don't understand this.  This adds a new option for cases where the
>> existing option wasn't specified.  Why not specify the existing
>> option
>> then?  Is it not good enough?  Can we improve it?
> 
> SET search_path = '...' not good enough in my opinion.
> 
> 1. Not specifying a SET clause falls back to the session's search_path,
> which is a bad default because it leads to all kinds of inconsistent
> behavior and security concerns.

Not specifying SEARCH would have the same issue?

> 2. There's no way to explicitly request that you'd actually like to use
> the session's search_path, so it makes it very hard to ever change the
> default.

That sounds like something that should be fixed independently.  I could 
see this being useful for other GUC settings, like I want to run a 
function explicitly with the session's work_mem.

> 3. It's user-unfriendly. A safe search_path that would be suitable for
> most functions is "SET search_path = pg_catalog, pg_temp", which is
> arcane, and requires some explanation.

True, but is that specific to functions?  Maybe I want a safe 
search_path just in general, for a session or something.

> 4. search_path for the session is conceptually different than for a
> function. A session should be context-sensitive and the same query
> should (quite reasonably) behave differently for different sessions and
> users to sort out things like object name conflicts, etc. A function
> should (ordinarily) be context-insensitive, especially when used in
> something like an index expression or constraint. Having different
> syntax helps separate those concepts.

I'm not sure I follow that.  When you say a function should be 
context-insensitive, you could also say, a function should be 
context-sensitive, but have a separate context.  Which is kind of how it 
works now.  Maybe not well enough.

> 5. There's no way to prevent pg_temp from being included in the
> search_path. This is separately fixable, but having the proposed SEARCH
> syntax is likely to make for a better user experience in the common
> cases.

seems related to #3

> I'm open to suggestion about other ways to improve it, but SEARCH is
> what I came up with.

Some extensions of the current mechanism, like search_path = safe, 
search_path = session, search_path = inherit, etc. might work.




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Fri, 2023-08-18 at 14:25 +0200, Peter Eisentraut wrote:
>
> Not specifying SEARCH would have the same issue?

Not specifying SEARCH is equivalent to SEARCH DEFAULT, and that gives
us some control over what happens. In the proposed patch, a GUC
determines whether it behaves like SEARCH SESSION (the default for
compatibility reasons) or SEARCH SYSTEM (safer).

> > 2. There's no way to explicitly request that you'd actually like to
> > use
> > the session's search_path, so it makes it very hard to ever change
> > the
> > default.
>
> That sounds like something that should be fixed independently.  I
> could
> see this being useful for other GUC settings, like I want to run a
> function explicitly with the session's work_mem.

I'm confused about how this would work. It doesn't make sense to set a
GUC to be the session value in postgresql.conf, because there's no
session yet. And it doesn't really make sense in a top-level session,
because it would just be a no-op (right?). It maybe makes sense in a
function, but I'm still not totally clear on what that would mean.

>
> True, but is that specific to functions?  Maybe I want a safe
> search_path just in general, for a session or something.

I agree this is a somewhat orthogonal problem and we should have a way
to keep pg_temp out of the search_path entirely. We just need to agree
on a string representation of a search path that omits pg_temp. One
idea would be to have special identifiers "!pg_temp" and "!pg_catalog"
that would cause those to be excluded entirely.

>
> I'm not sure I follow that.  When you say a function should be
> context-insensitive, you could also say, a function should be
> context-sensitive, but have a separate context.  Which is kind of how
> it
> works now.  Maybe not well enough.

For functions called from index expressions or constraints, you want
the function's result to only depend on its arguments; otherwise you
can easily violate a constraint or cause an index to return wrong
results.

You're right that there is some other context, like the database
default collation, but (a) that's mostly nailed down; and (b) if it
changes unexpectedly that also causes problems.

> > I'm open to suggestion about other ways to improve it, but SEARCH
> > is
> > what I came up with.
>
> Some extensions of the current mechanism, like search_path = safe,
> search_path = session, search_path = inherit, etc. might work.

I had considered some new special names like this in search path, but I
didn't come up with a specific proposal that I liked. Do you have some
more details about how this would help get us to a safe default?

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Andres Freund
Дата:
Hi,

On 2023-08-14 12:25:30 -0700, Jeff Davis wrote:
> On Sat, 2023-08-12 at 11:25 -0700, Andres Freund wrote:
> >
> > I'm not sure that anything based, directly or indirectly, on
> > search_path
> > really is a realistic way to get there.
>
> Can you explain a little more? I see what you mean generally, that
> search_path is an imprecise thing, and that it leaves room for
> ambiguity and mistakes.

It just doesn't seem to provide enough control and it's really painful for
users to manage. If you install a bunch of extensions into public - very very
common from what I have seen - you can't really remove public from the search
path. Which then basically makes all approaches of resolving any of the
security issues via search path pretty toothless.


> > I think that'd be pretty painful from a UX perspective. Having to
> > write
> > e.g. operators as operator(schema, op) just sucks as an experience.
>
> I'm not suggesting that the user fully-qualify everything; I'm
> suggesting that the include a "SET search_path" clause if they depend
> on anything other than pg_catalog.

I don't think that really works in practice, due to the very common practice
of installing extensions into the same schema as the application. Then that
schema needs to be in search path (if you don't want to schema qualify
everything), which leaves you wide open.


> > And with
> > extensions plenty of operators will live outside of pg_catalog, so
> > there is
> > plenty things that will need qualifying.
>
> In my proposal, that would still involve a "SET search_path TO
> myextension, pg_catalog, pg_temp".

myextension is typically public. Which means that there's zero protection due
to such a search path.


> >   And because of things like type
> > coercion search, which prefers "bettering fitting" coercions over
> > search path
> > order, you can't just put "less important" things later in search
> > path.
>
> I understand this introduces some ambiguity, but you just can't include
> schemas in the search_path that you don't trust, for similar reasons as
> $PATH. If you have a few objects you'd like to access in another user's
> schema, fully-qualify them.

I think the more common attack paths are things like tricking extension
scripts into evaluating arbitrary code, to gain "real superuser" privileges.


> > We can't just store the oids at the time, because that'd end up very
> > fragile -
> > tables/functions/... might be dropped and recreated etc and thus
> > change their
> > oid.
>
> Robert suggested something along those lines[1]. I won't rule it out,
> but I agree that there are quite a few things left to figure out.
>
> >  But we could change the core PLs to rewrite all the queries (*) so
> > that
> > they schema qualify absolutely everything, including operators and
> > implicit
> > type casts.
>
> So not quite like "SET search_path FROM CURRENT": you resolve it to a
> specific "schemaname.objectname", but stop just short of resolving to a
> specific OID?
>
> An interesting compromise, but I'm not sure what the benefit is vs. SET
> search_path FROM CURRENT (or some defined search_path).

Search path does not reliably protect things involving "type matching". If you
have a better fitting cast, or a function call with parameters that won't need
coercion, later in search path, they'll win, even if there's another fit
earlier on.

IOW, search path is a bandaid for this kind of thing, at best.

If we instead store something that avoids the need for such search, the
"better fitting cast" logic wouldn't add these kind of security issues
anymore.


> > That way objects referenced by functions can still be replaced, but
> > search
> > path can't be used to "inject" objects in different schemas.
> > Obviously it
> > could lead to errors on some schema changes - e.g. changing a column
> > type
> > might mean that a relevant cast lives in a different place than with
> > the old
> > type - but I think that'll be quite rare. Perhaps we could offer a
> > ALTER
> > FUNCTION ... REFRESH REFERENCES; or such?
>
> Hmm. I feel like that's making things more complicated. I'd find it
> more straightforward to use something like Robert's approach of fully
> parsing something, and then have the REFRESH command reparse it when
> something needs updating. Or perhaps just create all of the dependency
> entries more like a view query and then auto-refresh.

Hm, I'm not quite sure I follow on what exactly you see as different here.

Greetings,

Andres Freund



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Wed, Aug 16, 2023 at 1:45 PM Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
> > On 12.08.23 04:35, Jeff Davis wrote:
> > > The attached patch implements a new SEARCH clause for CREATE
> > > FUNCTION.
> > > The SEARCH clause controls the search_path used when executing
> > > functions that were created without a SET clause.
> >
> > I don't understand this.  This adds a new option for cases where the
> > existing option wasn't specified.  Why not specify the existing
> > option
> > then?  Is it not good enough?  Can we improve it?
>
> SET search_path = '...' not good enough in my opinion.
>
> 1. Not specifying a SET clause falls back to the session's search_path,
> which is a bad default because it leads to all kinds of inconsistent
> behavior and security concerns.
>
> 2. There's no way to explicitly request that you'd actually like to use
> the session's search_path, so it makes it very hard to ever change the
> default.
>
> 3. It's user-unfriendly. A safe search_path that would be suitable for
> most functions is "SET search_path = pg_catalog, pg_temp", which is
> arcane, and requires some explanation.
>
> 4. search_path for the session is conceptually different than for a
> function. A session should be context-sensitive and the same query
> should (quite reasonably) behave differently for different sessions and
> users to sort out things like object name conflicts, etc. A function
> should (ordinarily) be context-insensitive, especially when used in
> something like an index expression or constraint. Having different
> syntax helps separate those concepts.
>
> 5. There's no way to prevent pg_temp from being included in the
> search_path. This is separately fixable, but having the proposed SEARCH
> syntax is likely to make for a better user experience in the common
> cases.
>
> I'm open to suggestion about other ways to improve it, but SEARCH is
> what I came up with.

The one thing that I really like about your proposal is that you
explicitly included a way of specifying that the prevailing
search_path should be used. If we move to any kind of a system where
the default behavior is something other than that, then we need that
as an option. Another, related thing that I recently discovered would
be useful is a way to say "I'd like to switch the search_path to X,
but I'd also like to discover what the prevailing search_path was just
before entering this function." For example, if I have a function that
is SECURITY DEFINER which takes some executable code as an input, I
might want to arrange to eventually execute that code with the
caller's user ID and search_path, but I can't discover the caller's
search_path unless I don't set it, and that's a dangerous thing to do.

However, my overall concern here is that this feels like it's
reinventing the wheel. We already have a way of setting search_path;
this gives us a second one. If we had no existing mechanism for that,
I think this would definitely be an improvement, and quite possibly
better than the current mechanism. But given that we had a mechanism
already, if we added this, we'd then have two, which seems like the
wrong number.

I'm inclined to think that if there are semantics that we currently
lack, we should think of extending the current syntax to support them.
Right now you can SET search_path = 'specific value' or SET
search_path FROM CURRENT or leave it out. We could introduce a new way
of spelling "leave it out," like RESET search_path or whatever. We
could introduce a new setting that doesn't set the search_path at all
but reverts to the old value on function exit, like SET search_path
USING CALL or whatever. And we could think of making SET search_path
FROM CURRENT or any new semantics we introduce the default in a future
release, or even make the default behavior depend on an evil
behavior-changing GUC as you proposed. I'm not quite sure what we
should do here conceptually, but I don't see why having a completely
new syntax for doing it really helps.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Sat, 2023-08-19 at 11:59 -0700, Andres Freund wrote:
> If you install a bunch of extensions into public - very very
> common from what I have seen - you can't really remove public from
> the search
> path. Which then basically makes all approaches of resolving any of
> the
> security issues via search path pretty toothless.

Toothless only if (a) untrusted users have CREATE privileges in the
public schema, which is no longer the default; and (b) you're writing a
function that accesses extension objects installed in the public
schema.

While those may be normal things to do, there are a lot of times when
those things aren't true. I speculate that it's far more common to
write functions that only use pg_catalog objects (e.g. the "+"
operator, some string manipulation, etc.) and basic control flow.

There's a lot of value in making those simple cases secure-by-default.
We are already moving users towards a readable-but-not-writable public
schema as a best practice, so if we also move to something like SEARCH
SYSTEM as a best practice, then that will help a LOT of users.

> >
> I don't think that really works in practice, due to the very common
> practice
> of installing extensions into the same schema as the application.
> Then that
> schema needs to be in search path (if you don't want to schema
> qualify
> everything), which leaves you wide open.

...

> >
> myextension is typically public. Which means that there's zero
> protection due
> to such a search path.

You mentioned this three times so I must be missing something. Why is
it "wide open" and "zero protection"? If the schema is not world-
writable, then aren't attacks a lot harder to pull off?

> >
> I think the more common attack paths are things like tricking
> extension
> scripts into evaluating arbitrary code, to gain "real superuser"
> privileges.

Extension scripts are a separate beast. I do see some potential avenues
of attack, but I don't see how your approach of resolving schemas early
would help.

> Search path does not reliably protect things involving "type
> matching". If you
> have a better fitting cast, or a function call with parameters that
> won't need
> coercion, later in search path, they'll win, even if there's another
> fit
> earlier on.

You need to trust the schemas in your search_path.

> If we instead store something that avoids the need for such search,
> the
> "better fitting cast" logic wouldn't add these kind of security
> issues
> anymore.

I don't disagree, but I don't understand the approach in detail (i.e. I
couldn't write it up as a proposal). For instance, what would the
pg_dump output look like?

And even if we had that in place, I think we'd still want a better way
to control the search_path.

> >
> Hm, I'm not quite sure I follow on what exactly you see as different
> here.

From what I understand, Robert's approach is to fully parse the
commands and resolve to specific OIDs (necessitating dependencies,
etc.); while your approach resolves to fully-qualified names but not
OIDs (and needing no dependencies).

I don't understand either proposal entirely, so perhaps I'm on the
wrong track here, but I feel like Robert's approach is more "normal"
and easy to document whereas your approach is more "creative" and
perhaps hard to document.

Both approaches (resolving to names and resolving to OIDs) seem pretty
far away, so I'm still very much inclined to nudge users toward safer
best practices with search_path. I think SEARCH SYSTEM is a good start
there and doable for 17.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Mon, 2023-08-21 at 15:14 -0400, Robert Haas wrote:
> Another, related thing that I recently discovered would
> be useful is a way to say "I'd like to switch the search_path to X,
> but I'd also like to discover what the prevailing search_path was
> just
> before entering this function."

Interesting, that could probably be accommodated one way or another.

> However, my overall concern here is that this feels like it's
> reinventing the wheel. We already have a way of setting search_path;
> this gives us a second one.

In one sense, you are obviously right. We have a way to set search_path
for a function already, just like any other GUC.

But I don't look at the search_path as "just another GUC" when it comes
to executing a function. The source of the initial value of search_path
is more like the IMMUTABLE marker.

We can also do something with the knowledge the SEARCH marker gives us.
For instance, issue WARNINGs or ERRORs when someone uses a SEARCH
SESSION function in an index expression or constraint, or perhaps when
they try to declare a function IMMUTABLE in the first place.

In other words, the SEARCH clause tells us where search_path comes
from, not so much what it is specifically. I believe that tells us
something fundamental about the kind of function it is. If I tell you
nothing about a function except whether the search path comes from the
system or the session, you can imagine how it should be used (or not
used, as the case may be).

> I'm inclined to think that if there are semantics that we currently
> lack, we should think of extending the current syntax to support
> them.
> Right now you can SET search_path = 'specific value' or SET
> search_path FROM CURRENT or leave it out. We could introduce a new
> way
> of spelling "leave it out," like RESET search_path or whatever.

The thought occurred to me but any way I looked at it was messier and
less user-friendly. It feels like generalizing from search_path to all
GUCs, and then needing to specialize for search_path anyway.

For instance, if we want the default search_path to be the safe value
'pg_catalog, pg_temp', where would that default value come from? Or
instead, we could say that the default would be FROM CURRENT, which
would seem to generalize; but then we immediately run into the problem
that we don't want most GUCs to default to FROM CURRENT (because that
would capture the entire GUC state, which seems bad for several
reasons), and again we'd need to specialize for search_path.


In other words, search_path really *is* special. I don't think it's
great to generalize from it as though it were just like every other
GUC.

I do recognize that "SEARCH SYSTEM ... SET search_path = '...'" is
redundant, and that's not great. I just see the other options as worse,
but if I've misunderstood your approach then please clarify.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Mon, Aug 21, 2023 at 5:32 PM Jeff Davis <pgsql@j-davis.com> wrote:
> But I don't look at the search_path as "just another GUC" when it comes
> to executing a function. The source of the initial value of search_path
> is more like the IMMUTABLE marker.

I mean I agree and I disagree.

Philosophically, I agree. Most functions are written with some
particular search_path in mind; the author imagines that the function
will be executed with, well, probably whatever search path the author
typically uses themselves. Now and then, someone may write a function
that's intended to run with various different search paths, e.g.
anything of the form customer_XXXX, pg_catalog, pg_temp. I think that
is a real thing that people actually do, intentionally varying the
search_path with the idea of rebinding some references. However, cases
where somebody sincerely intends for the caller to be able to make +
or || mean something different from normal probably do not exist in
practice. So, if we were designing a system from scratch, then I would
recommend against making search_path a GUC, because it's clearly
shouldn't behave in the same way as a session property like
debug_print_plan or enable_seqscan, where you could want to run the
same code with various values.

But practically, I disagree. As things stand today, search_path *is* a
GUC that dynamically changes the run-time properties of a session, and
your proposed patch wouldn't change that. What it would do is layer
another mechanism on top of that which, IMHO, makes something that is
already complicated and error-prone even more complicated. If we
wanted to really make seach_path behave like a property of the code
rather than the session, I think we'd need to change quite a bit more
stuff, and the price of that in terms of backward-compatibility might
be higher than we'd be willing to pay, but if, hypothetically, we
decided to pay that price, then at the end of it search_path as a GUC
would be gone, and we'd have one way of managing sarch_path that is
different from the one we have now.

But with the patch as you have proposed it that's not what happens. We
just end up with two interconnected mechanisms for managing what,
right now, is managed by a single mechanism. That mechanism is (and I
think we probably mostly all agree on this) bad. Like really really
bad. But having more than one mechanism, to me, still seems worse.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Mon, 2023-09-18 at 12:01 -0400, Robert Haas wrote:
> But with the patch as you have proposed it that's not what happens.
> We
> just end up with two interconnected mechanisms for managing what,
> right now, is managed by a single mechanism. That mechanism is (and I
> think we probably mostly all agree on this) bad. Like really really
> bad. But having more than one mechanism, to me, still seems worse.

I don't want to make an argument of the form "the status quo is really
bad, and therefore my proposal is good". That line of argument is
suspect for good reason.

But if my proposal isn't good enough, and we don't have a clear
alternative, we need to think seriously about how much we've
collectively over-promised and under-delivered on the concept of
privilege separation.

Absent a better idea, we need to figure out a way to un-promise what we
can't do and somehow guide users towards safe practices. For instance,
don't grant the INSERT or UPDATE privilege if the table uses functions
in index expressions or constraints. Also don't touch any table unless
the onwer has SET ROLE privileges on your role already, or the
operation is part of a special carve out (logical replication or a
maintenance command). And don't use the predefined role
pg_write_all_data, because that's unsafe for most imaginable use cases.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Mon, Sep 18, 2023 at 4:51 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I don't want to make an argument of the form "the status quo is really
> bad, and therefore my proposal is good". That line of argument is
> suspect for good reason.

+1.

> But if my proposal isn't good enough, and we don't have a clear
> alternative, we need to think seriously about how much we've
> collectively over-promised and under-delivered on the concept of
> privilege separation.
>
> Absent a better idea, we need to figure out a way to un-promise what we
> can't do and somehow guide users towards safe practices. For instance,
> don't grant the INSERT or UPDATE privilege if the table uses functions
> in index expressions or constraints. Also don't touch any table unless
> the onwer has SET ROLE privileges on your role already, or the
> operation is part of a special carve out (logical replication or a
> maintenance command). And don't use the predefined role
> pg_write_all_data, because that's unsafe for most imaginable use cases.

I agree this is a mess, and that documenting the mess better would be
good. But instead of saying not to do something, we need to say what
will happen if you do the thing. I'm regularly annoyed when somebody
reports that "I tried to do X and it didn't work," instead of saying
what happened when they tried, and this situation is another form of
the same thing. "If you do X, then Y will or can occur" is much better
than "do not do X". And I think better documentation of this area
would be useful regardless of any other improvements that we may or
may not make. Indeed, really good documentation of this area might
facilitate making further improvements by highlighting some of the
problems so that they can more easily be understood by a wider
audience. I fear it will be hard to come up with something that is
clear, that highlights the severity of the problems, and that does not
veer off into useless vitriol against the status quo, but if we can
get there, that would be good.

But, leaving that to one side, what technical options do we have on
the table, supposing that we want to do something that is useful but
not this exact thing?

I think one option is to somehow change the behavior around
search_path but in a different way than you've proposed. The most
radical option would be to make it not be a GUC any more. I think the
backward-compatibility implications of that would likely be
unpalatable to many, and the details of what we'd actually do are also
not clear, at least to me. For a function, I think there is a
reasonable argument that you just make it a function property, like
IMMUTABLE, as you said before. But for code that goes directly into
the session, where's the search_path supposed to come from? It's got
to be configured somewhere, and somehow that somewhere feels a lot
like a GUC. That leads to a second idea, which is having it continue
to be a GUC but only affect directly-entered SQL, with all
indirectly-entered SQL either being stored as a node tree or having a
search_path property attached somewhere. Or, as a third idea, suppose
we leave it a GUC but start breaking semantics around where and how
that GUC gets set, e.g. by changing CREATE FUNCTION to capture the
prevailing search_path by default unless instructed otherwise.
Personally I feel like we'd need pretty broad consensus for any of
these kinds of changes because it would break a lot of stuff for a lot
of people, but if we could get that then I think we could maybe emerge
in a better spot once the pain of the compatibility break receded.

Another option is something around sandboxing and/or function trust.
The idea here is to not care too much about the search_path behavior
itself, and instead focus on the consequences, namely what code is
getting executed as which user and perhaps what kinds of operations
it's performing. To me, this seems like a possibly easier answer way
forward at least in the short to medium term, because I think it will
break fewer things for fewer people, and if somebody doesn't like the
new behavior they can just say "well I trust everyone completely" and
it all goes back to the way it was. That said, I think there are
problems with my previous proposals on the other thread so I believe
some adjustments would be needed there, and then there's the problem
of actually implementing anything. I'll try to respond to your
comments on that thread soon.

Are there any other categories of things we can do? More specific
kinds of things we can do in each category? I don't really see an
option other than (1) "change something in the system design so that
people use search_path wrongly less often" or (2) "make it so that it
doesn't matter as much if people using the wrong search_path" but
maybe I'm missing a clever idea.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
> I agree this is a mess, and that documenting the mess better would be
> good. But instead of saying not to do something, we need to say what
> will happen if you do the thing. I'm regularly annoyed when somebody
> reports that "I tried to do X and it didn't work," instead of saying
> what happened when they tried, and this situation is another form of
> the same thing. "If you do X, then Y will or can occur" is much
> better
> than "do not do X".

Good documentation includes some guidance. Sure, it should describe the
system behavior, but without anchoring it to some kind of expected use
case, it can be equally frustrating.

Allow me to pick on this example which came up in a recent thread:

"[password_required] Specifies whether connections to the publisher
made as a result of this subscription must use password authentication.
This setting is ignored when the subscription is owned by a superuser.
The default is true. Only superusers can set this value to false."
 -- https://www.postgresql.org/docs/16/sql-createsubscription.html

Only superusers can set it, and it's ignored for superusers. That does
a good job of describing the actual behavior, but is a bit puzzling.

I guess what the user is supposed to do is one of:
  1. Create a subscription as a superuser with the right connection
string (without a password) and password_required=false, then reassign
it to a non-superuser; or
  2. Create a subscription as a non-superuser member of
pg_create_subscription using a bogus connection string, then a
superuser can alter it to set password_required=false, then alter the
connection string; or
  3. Create a superuser, let the new superuser create a subscription
with password_required=false, and then remove their superuser status.

so why not just document one of those things as the expected thing to
do? Not a whole section or anything, but a sentence to suggest what
they should do or where else they should look.

I don't mean to set some major new standard in the documentation that
should apply to everything; but for the privilege system, even hackers
are having trouble keeping up (myself included). A bit of guidance
toward supported use cases helps a lot.

> I fear it will be hard to come up with something that is
> clear, that highlights the severity of the problems, and that does
> not
> veer off into useless vitriol against the status quo, but if we can
> get there, that would be good.

I hope what I'm saying is not useless vitriol. I am offering the best
solutions I see in a bad situation. And I believe I've uncovered some
emergent behaviors that are not well-understood even among prominent
hackers.

> That leads to a second idea, which is having it continue
> to be a GUC but only affect directly-entered SQL, with all
> indirectly-entered SQL either being stored as a node tree or having a
> search_path property attached somewhere.

That's not too far from the proposed patch and I'd certainly be
interested to hear more and/or adapt my patch towards this idea.

>  Or, as a third idea, suppose
> we leave it a GUC but start breaking semantics around where and how
> that GUC gets set, e.g. by changing CREATE FUNCTION to capture the
> prevailing search_path by default unless instructed otherwise.

How would one instruct otherwise?

> Personally I feel like we'd need pretty broad consensus for any of
> these kinds of changes

+1

>  because it would break a lot of stuff for a lot
> of people, but if we could get that then I think we could maybe
> emerge
> in a better spot once the pain of the compatibility break receded.

Are there ways we can soften this a bit? I know compatibility GUCs are
not to be added lightly, but perhaps one is justified here?

> Another option is something around sandboxing and/or function trust.
> The idea here is to not care too much about the search_path behavior
> itself, and instead focus on the consequences, namely what code is
> getting executed as which user and perhaps what kinds of operations
> it's performing.

I'm open to discussing that further, and it certainly may solve some
problems, but it does not seem to solve the fundamental problem with
search_path: that the caller can (intentionally or unintentionally)
cause a function to do unexpected things.

Sometimes an unexpected thing is not a the kind of thing that would be
caught by a sandbox, e.g. just an unexpected function result. But if
that function is used in a constraint or expression index, that
unexpected result can lead to a violated constraint or a bad index
(that will later cause wrong results). The owner of the table might
reasonably consider that a privilege problem, if the user who causes
the trouble had only INSERT privileges.

> Are there any other categories of things we can do? More specific
> kinds of things we can do in each category? I don't really see an
> option other than (1) "change something in the system design so that
> people use search_path wrongly less often" or (2) "make it so that it
> doesn't matter as much if people using the wrong search_path" but
> maybe I'm missing a clever idea.

Perhaps there are some clever ideas about maintaining compatibility
within the approaches (1) or (2), which might make one of them more
appealing.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Maciek Sakrejda
Дата:
On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
>...
> On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
> > That leads to a second idea, which is having it continue
> > to be a GUC but only affect directly-entered SQL, with all
> > indirectly-entered SQL either being stored as a node tree or having a
> > search_path property attached somewhere.
>
> That's not too far from the proposed patch and I'd certainly be
> interested to hear more and/or adapt my patch towards this idea.

As an interested bystander, that's the same thing I was thinking when
reading this. I reread your original e-mail, Jeff, and I still think
that.

I wonder if something like CURRENT (i.e., the search path at function
creation time) might be a useful keyword addition. I can see some uses
(more forgiving than SYSTEM but not as loose as SESSION), but I don't
know if it would justify its presence.

Thanks for working on this.

Thanks,
Maciek



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Maciek Sakrejda
Дата:
On Tue, Sep 19, 2023, 20:23 Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
I wonder if something like CURRENT (i.e., the search path at function
creation time) might be a useful keyword addition. I can see some uses
(more forgiving than SYSTEM but not as loose as SESSION), but I don't
know if it would justify its presence.

I realize now this is exactly what SET search_path FROM CURRENT does. Sorry for the noise.

Regarding extensions installed in the public schema throwing a wrench in the works, is that still a problem if the public schema is not writable? I know that that's a new default, but it won't be forever.

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Pavel Stehule
Дата:
Hi

st 20. 9. 2023 v 9:34 odesílatel Maciek Sakrejda <m.sakrejda@gmail.com> napsal:
On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
>...
> On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
> > That leads to a second idea, which is having it continue
> > to be a GUC but only affect directly-entered SQL, with all
> > indirectly-entered SQL either being stored as a node tree or having a
> > search_path property attached somewhere.
>
> That's not too far from the proposed patch and I'd certainly be
> interested to hear more and/or adapt my patch towards this idea.

As an interested bystander, that's the same thing I was thinking when
reading this. I reread your original e-mail, Jeff, and I still think
that.

I wonder if something like CURRENT (i.e., the search path at function
creation time) might be a useful keyword addition. I can see some uses
(more forgiving than SYSTEM but not as loose as SESSION), but I don't
know if it would justify its presence.

Personally, I dislike this - because the value of the search path is hidden in this case. 

I agree so it can be comfortable, but it can be confusing for review, migration, ...

Regards

Pavel


Thanks for working on this.

Thanks,
Maciek


Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Tue, Sep 19, 2023 at 7:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Good documentation includes some guidance. Sure, it should describe the
> system behavior, but without anchoring it to some kind of expected use
> case, it can be equally frustrating.

Fair.

> I don't mean to set some major new standard in the documentation that
> should apply to everything; but for the privilege system, even hackers
> are having trouble keeping up (myself included). A bit of guidance
> toward supported use cases helps a lot.

Yeah, this stuff is complicated, and I agree that it's hard even for
hackers to keep up with. I don't really have a strong view on the
concrete case you mentioned involving password_required. I always
worry that if there are three cases and we suggest one of them then
the others will be viewed negatively when really they're all equally
fine. On the other hand, that can often be addressed by starting the
sentence with "For example, you could...." or similar, so perhaps
there's no problem here at all. I generally agree with the idea that
examples can be useful for clarifying points that may otherwise be too
theoretical.

> I hope what I'm saying is not useless vitriol. I am offering the best
> solutions I see in a bad situation. And I believe I've uncovered some
> emergent behaviors that are not well-understood even among prominent
> hackers.

Yeah, I wasn't really intending to say that you were. I just get
nervous about statements like "don't ever do X!" because I find it
completely unconvincing. In my experience, when you tell people stuff
like that, some of them just go off and do it anyway and, especially
in a case like this, chances are very good that nothing bad will ever
happen to them, simply because most PostgreSQL installations don't
have malicious local users. When you tell them "but that's really bad"
they say "why?" and if the documentation doesn't have an answer to the
question well then that sucks.

> >  because it would break a lot of stuff for a lot
> > of people, but if we could get that then I think we could maybe
> > emerge
> > in a better spot once the pain of the compatibility break receded.
>
> Are there ways we can soften this a bit? I know compatibility GUCs are
> not to be added lightly, but perhaps one is justified here?

I don't know. I'm skeptical. This behavior is so complicated and hard
to get right. Having it GUC-dependent makes it even more confusing
than it is already. But I guess it also depends on what the GUC does.

Let's say we make a rule that every function or procedure has to have
a search_path attached to it as a function property. That is, CREATE
FUNCTION .. SEARCH something sets pg_proc.prosearch = 'something'. If
you omit the SEARCH clause, one is implicitly supplied for you. If you
say SEARCH NULL, then the function is executed with the search_path
taken from the GUC; SEARCH 'anything_else' specified a literal
search_path to be used.

In such a world, I can imagine having a GUC that determines whether
the implicitly supplied SEARCH clause is SEARCH
${WHATEVER_THE_SEARCH_PATH_IS_RIGHT_NOW} or SEARCH NULL. Such a GUC
only takes effect at CREATE FUNCTION time. However, I cannot imagine
having a GUC that causes the SEARCH clauses attached to all functions
to be globally ignored at execution time. That seems like a choice we
would likely regret bitterly. The first thing is already painful, but
the second one is exponentially worse, because in the first world, you
have to be careful to get your functions defined correctly, but if you
do, you know they'll run OK on any PostgreSQL cluster anywhere,
whereas in the second world, there's no way to define a function that
behaves the same way on every PostgreSQL instance. Imagine being an
extension author, for example.

I am a little worried that this kind of design might end up reversing
the direction of some security problems that we have now. For
instance, right now, if you call a function with a SET search_path
clause, you know that it can't make any changes to search_path that
survive function exit. You'll get your old search_path back. With this
kind of design, it seems like it would be a lot easier to get back to
the SQL toplevel and find the search_path surprisingly changed under
you. I think we have that problem in some cases already, though. I'm
unclear how much worse this makes it.

> > Another option is something around sandboxing and/or function trust.
> > The idea here is to not care too much about the search_path behavior
> > itself, and instead focus on the consequences, namely what code is
> > getting executed as which user and perhaps what kinds of operations
> > it's performing.
>
> I'm open to discussing that further, and it certainly may solve some
> problems, but it does not seem to solve the fundamental problem with
> search_path: that the caller can (intentionally or unintentionally)
> cause a function to do unexpected things.

Well, I think it's meant to solve that problem.  How effectively it
does so is a point worth debating.

> Sometimes an unexpected thing is not a the kind of thing that would be
> caught by a sandbox, e.g. just an unexpected function result. But if
> that function is used in a constraint or expression index, that
> unexpected result can lead to a violated constraint or a bad index
> (that will later cause wrong results). The owner of the table might
> reasonably consider that a privilege problem, if the user who causes
> the trouble had only INSERT privileges.

That's an interesting example. Earlier versions of the function trust
proposal proposed to block *any* execution of code belonging to an
untrusted party. That could potentially block this attack. However, it
would also block a lot of other things. For instance, if Alice tries
to insert into Bob's table and Bob's table has a CHECK constraint or
an index expression, Alice has to trust Bob or she can't insert
anything at all. By trusting Bob just enough to allow him do things
like CHECK(LENGTH(foo) < 10) or whatever, Alice can operate on Bob's
table without a problem in normal cases, but is still protected if Bob
suddenly starts doing something sneaky. I think that's a significant
improvement, because a system that is so stringent that it blocks even
completely harmless things is likely to get disabled, at which point
it protects nobody from anything.

However, that analysis presumes that what we're trying to do is
protect Alice from Bob, and I think you're raising the question of how
we protect Bob from Alice. Suppose Bob has got a trigger function but
has failed to control search_path for that function. Alice can set
search_path so that Bob's trigger calls some function or operator that
she owns instead of the intended call to, say, a system function or
operator. Some sufficiently-rigid function trust system could catch
this: Bob doesn't trust Alice, and so the fact that his code is trying
to call some a function or operator owned by Alice is a red flag. On
the basis of the fact that Bob doesn't trust Alice, we should error
out to protect Bob. Had the search_path been set in the expected way,
Bob would have been trying to call a superuser-owned function, and Bob
must trust the superuser, so the operation is permitted.

I wouldn't have a problem with a function-trust proposal that
incorporated a mode that rigid as a configuration option. I find this
a convincing example of how that could be useful. But such a mode has
pretty serious downsides, too. It makes it very difficult for one user
to interact with another user's objects in any way without triggering
security errors.

Also, in a case like this, I don't think it's unreasonable to ask
whether, perhaps, Bob just needs to be a little more careful about
setting search_path. I think that there is a big difference between
(a) defining a SQL-language function that is accessible to multiple
users and (b) inserting a row into a table you don't own. When you
define a function, you know people are potentially going to call it.
Asking you, as the function author, to take some care to secure your
function against a malicious search_path doesn't seem like an
unsupportable burden. After all, you control the definition of that
function. The problem with inserting a row into a table you don't own
is that all of the objects involved -- the table itself, its indexes,
its triggers, its defaults, its constraints -- are owned by somebody
else, and that user controls those objects and can change any of them
at any time. You can't really be expected to verify that all code
reachable as a result of an INSERT into the table is safe enough
before every INSERT into that table. You can, I think, be expected to
check that functions you define have SET search_path attached.

> > Are there any other categories of things we can do? More specific
> > kinds of things we can do in each category? I don't really see an
> > option other than (1) "change something in the system design so that
> > people use search_path wrongly less often" or (2) "make it so that it
> > doesn't matter as much if people using the wrong search_path" but
> > maybe I'm missing a clever idea.
>
> Perhaps there are some clever ideas about maintaining compatibility
> within the approaches (1) or (2), which might make one of them more
> appealing.

Indeed.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Tue, 2023-09-19 at 20:23 -0700, Maciek Sakrejda wrote:
> On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > ...
> > On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
> > > That leads to a second idea, which is having it continue
> > > to be a GUC but only affect directly-entered SQL, with all
> > > indirectly-entered SQL either being stored as a node tree or
> > > having a
> > > search_path property attached somewhere.
> >
> > That's not too far from the proposed patch and I'd certainly be
> > interested to hear more and/or adapt my patch towards this idea.
>
> As an interested bystander, that's the same thing I was thinking when
> reading this. I reread your original e-mail, Jeff, and I still think
> that.

I have attached an updated patch. Changes:

  * Syntax is now: SEARCH FROM { DEFAULT | TRUSTED | SESSION }
    - added "FROM" to suggest that it's the source, and only a starting
place, rather than a specific and final setting. I don't feel strongly
about the FROM one way or another, so I can take it out if it's not
helpful.
    - changed "SYSTEM" to "TRUSTED", which better reflects the purpose,
and doesn't suggest any connection to ALTER SYSTEM.
  * Removed GUC -- we can reconsider this kind of thing later.
  * ERROR if IMMUTABLE is combined with SEARCH FROM SESSION
  * pg_dump support. Emits "SEARCH FROM SESSION" or "SEARCH FROM
TRUSTED" only if explicitly specified; otherwise emits no SEARCH
clause. Differentiating the unspecified cases may be useful for
migration purposes later.
  * psql support.
  * Updated docs to try to better present the concept, and document
CREATE PROCEDURE as well.


The SEARCH clause declares a new property that will be useful to both
enforce safety and also to guide users to migrate in a safe direction
over time.

For instance, the current patch prohibits the combination of IMMUTABLE
and SEARCH FROM SESSION; but allows IMMUTABLE if no SEARCH clause is
specified at all (to avoid breaking anything). We could extend that
slowly over several releases ratchet up the pressure (with warnings or
changing defaults) until all IMMUTABLE functions require SEARCH FROM
TRUSTED. Perhaps IMMUTABLE would even imply SEARCH FROM TRUSTED.

The search property is consistent with other properties, like
IMMUTABLE, which is both a marker and also enforces some restrictions
(e.g. you can't CREATE TABLE). It's also a lot nicer to use than a SET
clause, and provides a nice place to document certain behaviors.

(Aside: the concept of IMMUTABLE is basically broken today, due to
search_path problems.)

SEARCH FROM DEFAULT is just a way to get an object back to the
"unspecified search clause" state. It has the same behavior as SEARCH
FROM SESSION, except that the former will cause a hard error when
combined with IMMUTABLE. I think it's worth differentiating the
unspecified search clause from the explicit SEARCH FROM SESSION clause
for the purposes of migration.

There were three main complaints:

Comaplaint A: That it creates a new mechanism[1].

The patch doesn't create a new internal mechanism, it almost entirely
reuses the existing SET clause mechanism. I think complaint A is really
about the user-facing mechanics, which is essentially the same as the
complaint B.

Complaint B: That it's overlapping in functionality with the SET
clause[2][3]. In other words:

   CREATE FUNCTION ... SEARCH FROM TRUSTED ...;
   CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...;

do similar things. But the latter is much worse:

   * it's user-unfriendly (requiring pg_temp is highly unintuitive)
   * it doesn't allow Postgres to warn if the function is being used in
an unsafe context
   * there's no way to explicitly declare that you want the search path
to come from the session instead (either to be more clear about your
intentions, or to be forward-compatible)

In my opinion, the "SET search_path = ..." form should be used when you
actually want the search_path to contain some specific schema, not in
cases where you're just using built-in objects.

Complaint C: search_path is hopeless[4].

I think we can make meaningful improvements to the status quo, like
with the attached patch, that will reduce a lot of the surface area for
security risks. Right now our privilege model breaks down very quickly
even with trivial and typical use cases and we can do better.


--
Jeff Davis
PostgreSQL Contributor Team - AWS

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaRPJJN%3DAOqC4b9t90vFQX81hKiXNPNhbxR0-Sm8F8nCA%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CA%2BTgmoah_bTjUFng-vZnivPQs0kQWUaSwAu49SU5M%2BzTxA%2B3Qw%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/15464811-18fb-c7d4-4620-873366d367d6%40eisentraut.org
[4]
https://www.postgresql.org/message-id/20230812182559.d7plqwx3p65ys4i7%40awork3.anarazel.de



Вложения

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Thu, 2023-09-21 at 14:06 -0400, Robert Haas wrote:

> Also, in a case like this, I don't think it's unreasonable to ask
> whether, perhaps, Bob just needs to be a little more careful about
> setting search_path.

That's what this whole thread is about: I wish it was reasonable, but I
don't think the tools we provide today make it reasonable. You expect
Bob to do something like:

  CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...

for all functions, not just SECURITY DEFINER functions, is that right?

Up until now, we've mostly treated search_path as a problem for
SECURITY DEFINER, and specifying something like that might be
reasonable for a small number of SECURITY DEFINER functions.

But as my example showed, search_path is actually a problem for
SECURITY INVOKER too: an index expression relies on the function
producing the correct results, and it's hard to control that without
controlling the search_path.

>  I think that there is a big difference between
> (a) defining a SQL-language function that is accessible to multiple
> users and (b) inserting a row into a table you don't own. When you
> define a function, you know people are potentially going to call it.

It's a bit problematic that (a) is the default:

   CREATE FUNCTION f(INT) RETURNS INT IMMUTABLE
     LANGUAGE plpgsql
     AS $$ BEGIN RETURN 42+$1; END; $$;
   CREATE TABLE x(i INT);
   CREATE INDEX x_idx ON x(f(i));
   GRANT INSERT ON TABLE x TO u2;

It's not obvious that f() is directly callable by u2 (though it is
documented).

I'm not disagreeing with the principle behind what you say above. My
point is that "accessible to multiple users" is the ordinary default
case, so there's no cue for the user that they need to do something
special to secure function f().

> Asking you, as the function author, to take some care to secure your
> function against a malicious search_path doesn't seem like an
> unsupportable burden.

What you are suggesting has been possible for quite some time. Do you
think users are taking care to do this today? If not, how can we
encourage them to do so?

> You can, I think, be expected to
> check that functions you define have SET search_path attached.

We've already established that even postgres hackers are having
difficulty keeping up with these nuances. Even though the SET clause
has been there for a long time, our documentation on the subject is
insufficient and misleading. And on top of that, it's extra typing and
noise for every schema file. Until we make some changes I don't think
we can expect users to do as you suggest.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2023-09-21 at 14:06 -0400, Robert Haas wrote:
> > Also, in a case like this, I don't think it's unreasonable to ask
> > whether, perhaps, Bob just needs to be a little more careful about
> > setting search_path.
>
> That's what this whole thread is about: I wish it was reasonable, but I
> don't think the tools we provide today make it reasonable. You expect
> Bob to do something like:
>
>   CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...
>
> for all functions, not just SECURITY DEFINER functions, is that right?

Yes, I do. I think it's self-evident that a SQL function's behavior is
not guaranteed to be invariant under all possible values of
search_path. If you care about your function behaving the same way all
the time, you have to set the search_path.

TBH, I don't see any reasonable way around that requirement. We can
perhaps provide some safeguards that will make it less likely that you
will get completely hosed if your forget, and we could decide to make
SET search_path or some mostly-equivalent thing the default at the
price of pretty large compatibility break, but you can't have
functions that both resolve object references using the caller's
search path and also reliably do what the author intended.

> > You can, I think, be expected to
> > check that functions you define have SET search_path attached.
>
> We've already established that even postgres hackers are having
> difficulty keeping up with these nuances. Even though the SET clause
> has been there for a long time, our documentation on the subject is
> insufficient and misleading. And on top of that, it's extra typing and
> noise for every schema file. Until we make some changes I don't think
> we can expect users to do as you suggest.

Respectfully, I find this position unreasonable, to the point of
finding it difficult to take seriously. You said in another part of
your email that I didn't quote here that it's a problem that it's a
problem that functions and procedures are created with public execute
access by default -- but you can work around this by using a schema to
which other users don't have access, or by changing the default
permissions for functions on the schema where you are creating them,
or by adjusting permissions on the individual objects. If you don't do
any of that but don't trust the other users on your system then you at
least need to set search_path. If you neither understand how function
permissions work nor understand the importance of controlling
search_path, you cannot expect to have a secure system with multiple,
mutually untrusting users. That's just never going to work, regardless
of what the server behavior is.

I also disagree with the idea that setting the search_path should be
regarded as noise. I think it's quite the opposite. I don't believe
that people want to run their functions under a sanitized search_path
that only includes system schemas. That might work for some people,
but I think most people will define functions that call other
functions that they themselves defined, or access tables that they
themselves created. They will therefore need the search_path to
include the schemas in which they created those objects. There's no
way for the system to magically figure out what the user wants here.
*Perhaps* if the function is defined interactively the then-current
value could be captured, but in a pg_dump for example that won't work,
and the configured value, wherever it came from initially, is going to
have to be recorded so that it can be recreated when the dump is
restored.

Most of the problems that we're dealing with here have analogues in
the world of shell scripts. A sql or plpgsql function is like a shell
script. If it's setuid, i.e. SECURITY DEFINER, you have to worry about
the caller hijacking it by setting PATH or IFS or LD_something. Even
if it isn't, you have to either trust that the caller has set a
reasonable PATH, or set one yourself, else your script isn't always
going to work reliably. Nobody really expects to be able to make a
setuid shell script secure at all -- that typically requires a wrapper
executable -- but it definitely can't be done by someone who doesn't
understand the importance of setting their PATH and has no idea how to
use chmod.

One thing that is quite different between the shell script situation
and what we do inside PostgreSQL is that there's a lot more security
by default. Every user gets a home directory which by default is
accessible only to them, or at the very least writable only by them,
and system directories have tightly-controlled permissions. I think
UNIX had analogues of a lot of the problems we have today 40 years
ago, but they've tightened things up. We've started to move in that
direction by, for example, removing public execute access by default.
If we want to move further in the direction that UNIX has taken, we
should probably get rid of the public schema altogether, and
auto-create per-user schemas with permissions that allow only that
user to access them. But that's only making it easier to not
accidentally have users accessing each other's stuff. The core problem
that, if people do want to access each other's stuff, they either need
to trust each other or really be on point with all the
security-related stuff. That's equally true in the shell script case,
and I think that problem is fundamental. It's just really not possible
for people to call other people's code frequently without everyone
involved either being super-careful about security or just not caring.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Joe Conway
Дата:
On 9/25/23 11:30, Robert Haas wrote:
> I don't believe that people want to run their functions under a
> sanitized search_path that only includes system schemas. That might
> work for some people, but I think most people will define functions
> that call other functions that they themselves defined, or access
> tables that they themselves created. They will therefore need the
> search_path to include the schemas in which they created those
> objects.
Without diving into all the detailed nuances of this discussion, this 
particular paragraph made me wonder if at least part of the problem here 
is that the same search_path is used to find "things that I want to 
execute" (functions and operators) and "things I want to access" 
(tables, etc).

I think many folks would be well served by only having system schemas in 
the search_path for the former (augmented by explicit schema qualifying 
of one's own functions), but agree that almost no one wants that for the 
latter (needing to schema qualify every table reference).

Should there be a way to have a separate "execution" search_path?

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Mon, Sep 25, 2023 at 12:00 PM Joe Conway <mail@joeconway.com> wrote:
> Should there be a way to have a separate "execution" search_path?

I have heard that idea proposed before, and I don't think it's a
terrible idea, but I also don't think it fixes anything terribly
fundamental. I think it's pretty normal for people to define functions
and procedures and then call them from other functions and procedures,
and if you do that, then you need that schema in your execution search
path. Of course, if somebody doesn't do that, or schema-qualifies all
such references, then this becomes useful for defense in depth. But I
find it hard to see it as anything more than defense in depth because
I think a lot of people will need to have use cases where they need to
put non-system schemas into the execution search path, and such people
wouldn't really benefit from the existence of this feature.

Slightly off-topic, but I wonder whether, if we do this, we ought to
do it by adding some kind of a marker to the existing search_path,
rather than by creating a new GUC. For example, maybe putting & before
a schema name means that it can be searched, but only for
non-executable things. Then you could set search_path = &jconway,
pg_catalog or something of that kind. It could potentially be more
powerful to have it be a completely separate setting, but if we do
that, everyone who currently needs to secure search_path properly
starts needing to also secure execution_search_path properly. This is
one of those cases where two is not better than one.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Mon, 2023-09-25 at 11:30 -0400, Robert Haas wrote:
> On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > You expect
> > Bob to do something like:
> >
> >   CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...
> >
> > for all functions, not just SECURITY DEFINER functions, is that
> > right?
>
> Yes, I do.

Do users like Bob do that today? If not, what causes you to expect them
to do so in the future?

> I think it's self-evident that a SQL function's behavior is
> not guaranteed to be invariant under all possible values of
> search_path.

It's certainly not self-evident in a literal sense. I think you mean
that it's "obvious" or something, and perhaps that narrow question is,
but it's also not terribly helpful.

If the important behaviors here were so obvious, how did we end up in
this mess in the first place?

> > We've already established that even postgres hackers are having
> > difficulty keeping up with these nuances. Even though the SET
> > clause
> > has been there for a long time, our documentation on the subject is
> > insufficient and misleading. And on top of that, it's extra typing
> > and
> > noise for every schema file. Until we make some changes I don't
> > think
> > we can expect users to do as you suggest.
>
> Respectfully, I find this position unreasonable, to the point of
> finding it difficult to take seriously.

Which part exactly is unreasonable?

 * Hackers are having trouble keeping up with the nuances.
 * Our documentation on the subject *is* insufficient and misleading.
 * "pg_temp" is noise.

It seems like you think that users are already doing "SET search_path =
pg_catalog, pg_temp" in all the necessary places, and therefore no
change is required?


> Most of the problems that we're dealing with here have analogues in
> the world of shell scripts.

I think analogies to unix are what caused a lot of the problems we have
today, because postgres is a very different world. In unix-like
environments, a file is just a file; in postgres, we have all kinds of
code attached in interesting ways.


Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote:
> Should there be a way to have a separate "execution" search_path?

I hadn't considered that and I like that idea for a few reasons:

  * a lot of the problem cases are for functions that don't need to
access tables at all, e.g., in an index expression.
  * it avoids annoyances with pg_temp, because that's not searched for
functions/operators anyway
  * perhaps we could force the object search_path to be empty for
IMMUTABLE functions?

I haven't thought it through in detail, but it seems like a promising
approach.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Joe Conway
Дата:
On 9/25/23 14:03, Jeff Davis wrote:
> On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote:
>> Should there be a way to have a separate "execution" search_path?
> 
> I hadn't considered that and I like that idea for a few reasons:
> 
>    * a lot of the problem cases are for functions that don't need to
> access tables at all, e.g., in an index expression.
>    * it avoids annoyances with pg_temp, because that's not searched for
> functions/operators anyway
>    * perhaps we could force the object search_path to be empty for
> IMMUTABLE functions?
> 
> I haven't thought it through in detail, but it seems like a promising
> approach.


Related to this, it would be useful if you could grant create on schema 
for only non-executable objects. You may want to allow a user to create 
their own tables but not allow them to create their own functions, for 
example. Right now "GRANT CREATE ON SCHEMA foo" gives the grantee the 
ability to create "all the things".

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Mon, Sep 25, 2023 at 1:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Do users like Bob do that today? If not, what causes you to expect them
> to do so in the future?

What I would say is that if there's a reasonable way of securing your
stuff and you don't make use of it, that's your problem. If securing
your stuff is unreasonably difficult, that's a product problem. I
think that setting the search_path on your own functions is a basic
precaution that you should take if you are worried about multi-user
security. I do not believe it is realistic to eliminate that
requirement, and if people like Bob don't do that today and can't be
made to do that in the future, then I think it's just hopeless. In
contrast, I think that the precautions that you need to take when
doing anything to a table owned by another user are unreasonably
complex and not very realistic for anyone to take on a routine basis.
Even if you validate that there's nothing malicious before you access
the table, the table owner can change that at any time, so it's very
hard to reliably protect yourself.

In terms of whether people like Bob actually DO do that today, I'd say
probably some do and others don't. I think that the overwhelming
majority of PostgreSQL users simply aren't concerned about multi-user
security. They either have a single user account that is used for
everything, or say one account for the application and another for
interactive access, or they have a bunch of users but basically all of
those people are freely accessing each other's stuff and they're not
really concerned with firewalling them from each other. Among the
small percentage of users who are really concerned with making sure
that users can't get into each others accounts, I would expect that
knowing that you need to control search_path is fairly common, but
it's hard to say. I haven't actually met many such users.

> > I think it's self-evident that a SQL function's behavior is
> > not guaranteed to be invariant under all possible values of
> > search_path.
>
> It's certainly not self-evident in a literal sense. I think you mean
> that it's "obvious" or something, and perhaps that narrow question is,
> but it's also not terribly helpful.
>
> If the important behaviors here were so obvious, how did we end up in
> this mess in the first place?

I feel like this isn't really responsive to the argument that I was
and am making, and I'm worried that we're going down a rat-hole here.

I wondered after reading this whether I had misused the term
self-evident, but when I did a Google search for "self-evident" the
definition that comes up is "not needing to be demonstrated or
explained; obvious."

I am not saying that everyone is going to realize that you probably
ought to be setting search_path on all of your functions in any kind
of multi-user environment, and maybe even in a single-user environment
just to avoid weird failures if you ever change your default
search_path. What I am saying is that if you stop to think about what
search_path does while looking at any SQL function you've ever
written, you should probably realize pretty quickly that the behavior
of your function in search_path-dependent, and indeed that the
behavior of every other SQL function you've ever written is probably
search_path-dependent, too. I think the problem here isn't really that
this is hard to understand, but that many people have not stopped to
think about it.

Maybe it is obvious to you what we ought to do about that, but it is
not obvious to me. As I have said, I think that changing the behavior
of CREATE FUNCTION or CREATE PROCEDURE so that some search_path
control is the default is worth considering. However, I think that
such a change inevitably breaks backward compatibility, and I don't
think we have enough people weighing in on this thread to think that
we can just go do that even if everyone agreed on precisely what was
to be done, and I think it is pretty clear that we do not have
unanimous agreement.

> > Respectfully, I find this position unreasonable, to the point of
> > finding it difficult to take seriously.
>
> Which part exactly is unreasonable?

I interpreted you to be saying that we can't expect people to set
search_path on their functions. And I just don't buy that. We have
made mistakes in that area in PostgreSQL itself and had to fix them
later, and we may make more mistakes again in the future, so if you
think we need better documentation or better defaults, I think you
might be right. But if you think it's a crazy idea for people running
PostgreSQL in multi-user environments to understand that setting
search_path on all of their functions and procedures is essential, I
disagree. They've got to understand that, because it's not that
complicated, and there's no real alternative.

> I think analogies to unix are what caused a lot of the problems we have
> today, because postgres is a very different world. In unix-like
> environments, a file is just a file; in postgres, we have all kinds of
> code attached in interesting ways.

Yeah. That's another area where I find it very unclear how to do
better. From a security point of view, I think that the fact that
there are so many interesting places to attach code is completely
insane. It makes running a secure multi-user environment very
difficult, bordering on impossible. But is there any way we can really
fix that without just removing a whole bunch of functionality? I think
that some of the ideas that have been proposed here could help, but
I'm extremely doubtful that they or anything else are a complete
solution, and I'm pretty sure that there is no "easy button" here --
given the number of "interesting" ways to execute code, I think
security will always be tough to get right, regardless of what we
change.

My emails on this thread seem to have made you frustrated. For that, I am sorry.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Thu, 2023-09-21 at 14:33 -0700, Jeff Davis wrote:
> I have attached an updated patch. Changes:

Withdrawing this from CF due to lack of consensus.

I'm happy to resume this discussion if someone sees a path forward to
make it easier to secure the search_path; or at least help warn users
when a function without a secured search_path is being used unsafely.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Mon, 2023-09-25 at 11:30 -0400, Robert Haas wrote:
> > That's what this whole thread is about: I wish it was reasonable,
> > but I
> > don't think the tools we provide today make it reasonable. You
> > expect
> > Bob to do something like:
> >
> >   CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...
> >
> > for all functions, not just SECURITY DEFINER functions, is that
> > right?
>
> Yes, I do. I think it's self-evident that a SQL function's behavior
> is
> not guaranteed to be invariant under all possible values of
> search_path. If you care about your function behaving the same way
> all
> the time, you have to set the search_path.

After adding the search path cache (recent commit f26c2368dc) hopefully
that helps to make the above suggestion more reasonable performance-
wise. I think we can call that progress.

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Tue, Nov 14, 2023 at 11:21 PM Jeff Davis <pgsql@j-davis.com> wrote:
> After adding the search path cache (recent commit f26c2368dc) hopefully
> that helps to make the above suggestion more reasonable performance-
> wise. I think we can call that progress.

I agree. Not to burden you, but do you know what the overhead is now,
and do you have any plans to further reduce it? I don't believe that's
the only thing we ought to be doing here, necessarily, but it is one
thing that we definitely should be doing and probably the least
controversial.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Mon, 2023-11-20 at 15:52 -0500, Robert Haas wrote:
> I agree. Not to burden you, but do you know what the overhead is now,
> and do you have any plans to further reduce it? I don't believe
> that's
> the only thing we ought to be doing here, necessarily, but it is one
> thing that we definitely should be doing and probably the least
> controversial.

Running the simple test described here:

https://www.postgresql.org/message-id/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com

The baseline (no "SET search_path" clause on the function) is around
3800ms, and with the clause it shoots up to 8000ms. That's not good,
but it is down from about 12000ms before.

There are a few patches in the queue to bring it down further. Andres
and I were discussing some GUC hashtable optimizations here:

https://www.postgresql.org/message-id/27a7a289d5b8f42e1b1e79b1bcaeef3a40583bd2.camel@j-davis.com

which will (if committed) bring it down into the mid 7s.

There are also a couple other patches I have here (and intend to commit
soon):

https://www.postgresql.org/message-id/e6fded24cb8a2c53d4ef069d9f69cc7baaafe9ef.camel%40j-davis.com

and those I think will get it into the mid 6s. I think a bit lower
combined with the GUC hash table optimizations above.

So we are still looking at around 50% overhead for a simple function if
all this stuff gets committed. Not great, but a lot better than before.

Of course I welcome others to profile and see what they can do. There's
a setjmp() call, and a couple allocations, and maybe some other stuff
to look at. There are also higher-level ideas, like avoiding calling
into guc.c in some cases, but that starts to get tricky as you pointed
out:

https://www.postgresql.org/message-id/CA%2BTgmoa8uKQgak5wH0%3D7sL-ukqbwnCPMXA2ZW7Ccdt7tdNGkzg%40mail.gmail.com

It seems others are also interested in this problem, so I can put some
more effort in after this round of patches goes in. I don't have a
specific target other than "low enough overhead that we can reasonably
recommend it as a best practice in multi-user environments".

Regards,
    Jeff Davis




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Robert Haas
Дата:
On Mon, Nov 20, 2023 at 5:27 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Of course I welcome others to profile and see what they can do. There's
> a setjmp() call, and a couple allocations, and maybe some other stuff
> to look at. There are also higher-level ideas, like avoiding calling
> into guc.c in some cases, but that starts to get tricky as you pointed
> out:
>
> https://www.postgresql.org/message-id/CA%2BTgmoa8uKQgak5wH0%3D7sL-ukqbwnCPMXA2ZW7Ccdt7tdNGkzg%40mail.gmail.com
>
> It seems others are also interested in this problem, so I can put some
> more effort in after this round of patches goes in. I don't have a
> specific target other than "low enough overhead that we can reasonably
> recommend it as a best practice in multi-user environments".

The two things that jump out at me are the setjmp() and the
hash_search() call inside find_option(). As to the first, could we
remove the setjmp() and instead have abort-time processing know
something about this? For example, imagine we just push something onto
a stack like we do for ErrorContextCallback, do whatever, and then pop
it off. But if an error is thrown then the abort path knows to look at
that variable and do whatever. As to the second, could we somehow come
up with an API for guc.c where you can ask for an opaque handle that
will later allow you to do a fast-SET of a GUC? The opaque handle
would basically be the hashtable entry, perhaps with some kind of
wrapping or decoration. Then fmgr_security_definer() could obtain the
opaque handles and cache them in fn_extra.

(I'm just spitballing here.)

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Tue, 2023-11-21 at 09:24 -0500, Robert Haas wrote:
> As to the second, could we somehow come
> up with an API for guc.c where you can ask for an opaque handle that
> will later allow you to do a fast-SET of a GUC?

Yes, attached. That provides a significant speedup: my test goes from
around ~7300ms to ~6800ms.

This doesn't seem very controversial or complex, so I'll probably
commit this soon unless someone else has a comment.

--
Jeff Davis
PostgreSQL Contributor Team - AWS



Вложения

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
John Naylor
Дата:
On Tue, Dec 5, 2023 at 7:55 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2023-11-21 at 09:24 -0500, Robert Haas wrote:
> > As to the second, could we somehow come
> > up with an API for guc.c where you can ask for an opaque handle that
> > will later allow you to do a fast-SET of a GUC?
>
> Yes, attached. That provides a significant speedup: my test goes from
> around ~7300ms to ~6800ms.
>
> This doesn't seem very controversial or complex, so I'll probably
> commit this soon unless someone else has a comment.

+ * set_config_option_ext: sets option with the given handle to the given
+ * value.

Copy-paste-o of the other function name.

+config_handle *
+get_config_handle(const char *name)
+{
+ struct config_generic *record;
+
+ record = find_option(name, false, false, 0);
+ if (record == NULL)
+ return 0;

Part of this code this was copied from a function that returned int,
but this one returns a pointer.



Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Tue, 2023-12-05 at 23:22 +0700, John Naylor wrote:
> Copy-paste-o of the other function name.

...

> Part of this code this was copied from a function that returned int,
> but this one returns a pointer.

Thank you, fixed.

Also, I forward-declared config_generic in guc.h to eliminate the cast.

Regards,
    Jeff Davis


Вложения

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Tue, 2023-11-21 at 09:24 -0500, Robert Haas wrote:
> As to the first, could we
> remove the setjmp() and instead have abort-time processing know
> something about this? For example, imagine we just push something
> onto
> a stack like we do for ErrorContextCallback, do whatever, and then
> pop
> it off. But if an error is thrown then the abort path knows to look
> at
> that variable and do whatever.

If I remove the TRY/CATCH entirely, it shows there's room for ~200ms
improvement in my test.

I attached a rough patch, which doesn't quite achieve that much, it's
more like ~100ms improvement and starts to fall within the noise. So
perhaps an improvement, but a bit disappointing. It's not a lot of
code, but it's not trivial either because the nesting level needs to be
tracked (so a subxact abort doesn't reset too much state).

Also, it's not quite as clean as it could be, because I went to some
effort to avoid an alloc/free by keeping the stack within the fcache. I
didn't pay a lot of attention to correctness in this particular patch;
I was mostly trying a few different formulations for performance
measurement.

I'm not inclined to commit this in its current form but if someone
thinks that it's a worthwhile direction, I can clean it up a bit and
reconsider.

Regards,
    Jeff Davis


Вложения

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

От
Jeff Davis
Дата:
On Tue, 2023-12-05 at 11:58 -0800, Jeff Davis wrote:
> Also, I forward-declared config_generic in guc.h to eliminate the
> cast.

Looking more closely, I fixed an issue related to placeholder configs.
We can't return a handle to a placeholder, because it's not stable, so
in that case it falls back to using the name.

My apologies for the churn on this (mostly) simple patch. I think this
version is correct; I intend to commit soon.

Regards,
    Jeff Davis



Вложения