Обсуждение: Limiting the operations that client-side code can perform upon its database backend's artifacts

Поиск
Список
Период
Сортировка

Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Bryn Llewellyn
Дата:
On 14-Sep-2022, tgl@sss.pgh.pa.us wrote:

…. Therefore, if you don't trust another session that is running as your userID, you have already lost. That session can drop your tables, or corrupt the data in those tables to an arbitrary extent, and the SQL permissions system will not squawk even feebly… So if you're not happy with this hazard, you should not be accepting the idea that actors you don't trust are allowed to submit queries under the same userID as you. And if you're using a client-side software stack that forces that situation on you, it's time to look for another one.

Or in other words, I flatly reject the claim that this:

bryn@yugabyte.com wrote:

It's common to design a three tier app so that the middle tier always authorizes as just a single role—say, "client"—and where the operations that "client" can perform are limited as the overall design specifies.

is in any way sane or secure. There is not very much that the database server can do to clean up after insecure client-side stacks.

*BACKGROUND*

I'm starting a new thread here. What I wrote, and Tom's response, are taken from a longish thread that I started with the subject "Is it possible to stop sessions killing each other when they all authorize as the same role?", here:


<aside>That thread is "case closed" now. (My question arose from my basic misunderstanding of what's hard-wired and what is simply a default privilege regime that can be changed. And then I compounded my embarrassment by revoking "execute from public" on a "pg_catalog" function when "current_database()" had one value—and then not seeing the effect of this when "current_database()" had a different value.)

I wandered off topic with a claim about three tier app design. And that prompted Tom's response here:


</aside>

*ANYWAY...*

Tom's "I flatly reject" has been troubling me for the past couple of weeks. I wonder if what I wrote was unclear. I'll try a different way. First, w.r.t. Tom's

the main point of a database is to store your data

I think that more needs to be said, thus:

« The main point of a database is to store your data, to keep it in compliance with all the specified data rules, and to allow authorized client-side code to modify the data by using only a set of specified business functions. »

This implies a carefully designed within-database regime that takes advantage of established notions: for encapsulating the implementation of business functions; and for access control. This, in turn, implies a minimum of two distinct roles: one to own the entire implementation. And another to allow exactly and only the specified business functions to be performed by client-side code. In a real use case, user-defined functions or procedures define the business function API. And there'd be typically several roles that share the implementation and that take advantage of access control notions among themselves. My code example, below, reduces this paradigm as far as I could manage to allow a convincing demo of the principles. It relies on this:

— People who implement client-side code to access the database are given *only* the credentials to connect as one particular user, "client", that exposes the business function API.

— All other connect credentials, including but not at all limited to superuser credentials, are kept secret within a manageably small community of server-side engineers.

— Self-evidently, this relies on carefully designed and strictly implemented human practices. But so, too, does any human endeavor where security matters. In our domain, this implies that the overall design has a carefully written prose specification and that the development shop delivers a set of install scripts. Then a trusted person whose job is to administer the deployed app scrutinizes the scripts and runs them. In the limit, just a single person knows the deployment site passwords and can set "rolcanlogin" to "false" for every role that owns the implementation artifacts once the installation is done.

My demo seems to show that when a program connects as "client", it can perform exactly and only the database operations that the database design specified.

Am I missing something? In other words, can anybody show me a vulnerability?

*THE DEMO*

The code example models the simplest form of "hard shell encapsulation" that I could manage.

(I now realize that, with some very manageable effort, I can revoke all privileges on every object in the "pg_catalog" schema from public and then re-grant as needed to whatever roles need them—following the famous principle of least privilege. So none would be granted to "client" with the result that it can't see metadata about anything. A prose document would suffice for communicating what client-side engineers need to know.)

The idea is that "client" should see an "insert and select, only" view and be unable to do any DDLs. This relies on the fact that a view is non-negotiably "security definer". There are just two "vanilla" roles, "client" and "u1", thus:

do $body$
declare
  expected_roles constant name[] := array['client', 'u1'];
  roles          constant name[] := (
      select array_agg(rolname order by rolname)
      from pg_roles
      where rolname in ('u1', 'client')
      and rolcanlogin
      and not rolsuper
      and not rolinherit
      and not rolcreaterole
      and not rolcreatedb
      and not rolreplication
      and not rolbypassrls
      and has_database_privilege(rolname, current_database(), 'connect')
    );
begin
  assert roles = expected_roles, 'Unexpected';
end;
$body$;

I'm leaving out of what I show here the code that creates "client" and "u1" and that allows them to connect to (and in the case of "u1" only, create and change objects) in a suitable purpose-created database. The database starts off empty with no schemas 'cos "public" has been dropped. This bootstrap can easily be done by a superuser. With a bit more effort, it can be done by a non-superuser with  "createrole" and some privileges (with "grant option") on the database.

*SETUP*

\c - u1
create schema s authorization u1;
revoke all on schema s from public;
create table s.t(
  k bigint generated always as identity  primary key,
  c1 text not null constraint t_chk check(c1 = lower(c1)));
revoke all on table s.t from public;

create view s.v as select k, c1 from s.t;
revoke all on table s.v from public;
grant usage on schema s to client;
grant insert, select on table s.v to client;


*POSITIVE TESTS* (these succeed)

Can a session authorized as "client" do everything that's intended?

\c - client
insert into s.v(c1) values ('dog'), ('cat'), ('frog');
select k, c1 from s.v order by k;

*NEGATIVE TESTS* (the block finishes silently without error)

Can a session authorized as "client" « drop your tables, or corrupt the data in those tables to an arbitrary extent »?

do $body$
declare
  n bigint;
begin
  -- Try to do any operation on "s.t".
  begin
    insert into s.t(c1) values ('mouse');
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;
  begin
    update s.t set c1 = 'bird' where c1 = 'frog';
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;
  begin
    delete from s.t;
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;
  begin
    drop table s.t;
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;
  begin
    select nextval('s.t_k_seq') into n;
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;

  -- Try to do outlawed operations on "s.v".
  begin
    update s.v set c1 = 'bird' where c1 = 'frog';
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;
  begin
    delete from s.v;
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;
  begin
    drop view s.v;
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;
  begin
    create table s.x(n int);
    assert false, 'Unexpected';
  exception when insufficient_privilege then null; end;

  -- Try to "corrupt" the data -- i.e. to do DMLs that
  -- would break the rules.
  begin
    insert into s.v(c1) values('Rat');
    assert false, 'Unexpected';
  exception when check_violation then null; end;
  begin
    insert into s.v(k, c1) values(42, 'Rat');
    assert false, 'Unexpected';
  exception when generated_always then null; end;
end;
$body$;

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Julien Rouhaud
Дата:
On Mon, Sep 26, 2022 at 11:18:34AM -0700, Bryn Llewellyn wrote:
>
> My demo seems to show that when a program connects as "client", it can
> perform exactly and only the database operations that the database design
> specified.
>
> Am I missing something? In other words, can anybody show me a vulnerability?

What exactly prevents the client role from inserting e.g.

- 'robert''); drop table students; --'
- millions of 'cat' rows
- millions of 1GB-large rows

or just keep sending massive invalid query texts to fill the logs, or just
trying to connect until there's no available connection slots anymore, and then
keep spamming the server thousands of time per second to try to open new
connections, or ...?



Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
"Peter J. Holzer"
Дата:
On 2022-09-27 14:58:58 +0800, Julien Rouhaud wrote:
> On Mon, Sep 26, 2022 at 11:18:34AM -0700, Bryn Llewellyn wrote:
> > My demo seems to show that when a program connects as "client", it can
> > perform exactly and only the database operations that the database design
> > specified.
> >
> > Am I missing something? In other words, can anybody show me a vulnerability?
>
> What exactly prevents the client role from inserting e.g.
>
> - 'robert''); drop table students; --'

It can do this but it won't do any harm since the client role doesn't
have permission to drop the table-

> - millions of 'cat' rows
> - millions of 1GB-large rows

That depends on "the database operations that the database design
specified", but if the client role is supposed to be able to insert
data, you can't really prevent it from inserting non-sensical or
enormous data. You can encapsulate the insert functionality in a
function or procedure and do some sanity checks there. But automatically
distinguishing between legitimate use and abuse is generally not simple.

> or just keep sending massive invalid query texts to fill the logs, or just
> trying to connect until there's no available connection slots anymore, and then
> keep spamming the server thousands of time per second to try to open new
> connections, or ...?

There are often several layers of defense. The database frequently won't
be accessible from the open internet (or even the company network)
directly. Only a middle tier of application servers running vetted
client code will connect directly. Even those servers may not be
accessible directly to end users. There may be a layer of proxy servers
above them. Each of these layers may implement additional checks, rate
limits and monitoring.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Bryn Llewellyn
Дата:
> rjuju123@gmail.com wrote:
>
>> bryn@yugabyte.com wrote:
>>
>> My demo seems to show that when a program connects as "client", it can perform exactly and only the database
operationsthat the database design specified. Am I missing something? In other words, can anybody show me a
vulnerability?
>
> What exactly prevents the client role from inserting e.g.
>
> - 'robert''); drop table students; --'
> - millions of 'cat' rows
> - millions of 1GB-large rows
>
> or just keep sending massive invalid query texts to fill the logs, or just trying to connect until there's no
availableconnection slots anymore, and then keep spamming the server thousands of time per second to try to open new
connections,or ...? 

My little code example was meant to show the basic principle: that the within-database artifacts that implement an
application'sbackend can all be hidden from client code. The technique is simple and has been in use forever. That's
whyRDBMSs like PG have a privilege scheme. Object ownership and "current_role" are a central notions in any such
scheme.

My demo depended upon an ordinary human regime of password secrecy. The key point that my demo made was that "client"
ownsnothing, lacks the "create" privilege on the database in question, and (for good measure) lacks "create" on all
schemasin the database. Therefore, a session that authorizes as "client" is limited in what it can do. 

I'm not sure what you mean to say with this fragment:

 'robert''); drop table students; --'

It rather looks like something that you see in an essay about SQL injection. But the entire SQL injection discussion is
outof scope in my toy demo because the requirements statement simply allows a session that's authorized as "client" to
issueany SQL statement. I don't know why you picked the "students" table when there isn't one. I just ran my demo code
tocompletion, re-connected as "client", and did this: 

drop table students;

It produces the "42P01: error: table "students" does not exist. Of course, the message isn't lying. So this is a better
test:

drop view s.v;

This produces the "42501" error: must be owner of view v. This isn't a lie either. The hacker has now learned that, at
least,such a view does exist. Arguably, the different between the two errors is a bad thing. And famously, in Oracle
Database,you get a more generic "computer says no" in both cases. But PG is the way it is here and won't change in my
lifetime.So, playing the hacker role, I tried this: 

select definition from pg_views where schemaname = 's';

It caused the "42501" error: permission denied for view pg_views. And why shouldn't it? I didn't mention that I'd
revoked"select" on every "pg_catalog" relation (and every "information_schema" relation) from public and then granted
"select"explicitly on each to "u1" but not to "client". 

This is the text-book principle of least privilege: you start with nothing and add what you need. For historical
reasons,very few systems honor this principle by default. But it's an excellent feature of PG that you can overrule the
defaultin the way that I described. The present toy demo works fine (all the tests behave the same) after my hardening
intervention.

About inserting millions of rows, well... that's a word-game. The spec for my toy demo never mentioned that inserting
millionsof rows should be prevented. 

There's only so far that you can go if you decide to articulate the hard-shell API as "use any SQL statement that you
careto in order to access the intended app functionality". This is why the usual paradigm is to grant only "execute" on
adesigned set of subprograms that each implements a specified *business* function. People have been banging on about
thisapproach since the late eighties (and probably since before then). Of course, the approach depends on a designed
useof a privilege scheme. PG supports all this nicely. It's easy to implement an upper limit (in if-then-else code) on
thenumber of rows that a procedure that implements "insert" allows. I s'pose that you'll say that the bad guy could
callthe procedure time and again. But techniques are available there too. (They're roughly analogous to what stops you
makingwithdrawals from a bank account when the credit limit is reached.) Blocking a single "huge" row is trivial.
Probably,a constraint that uses a SQL expression would suffice. But you can always implement the user-defined function
forthe hugeness test if you need to. 

This leaves us with some kind of denial of service attack that uses a flavor of busy work or similar, like you mention.
Idon't think that there's any way that PG can prevent a connected role doing this: 

do $body$
begin
  loop
    <an operation that can't be prevented>
  end loop;
end;
$body$;

or, say, a "select" with a recursive CTE with no stopping condition. There's always "set statement_timeout"—but that's
inthe hands of the session that authorizes as "client". I know of another RDBMS that has a robust, server-side,
resourcemanagement scheme that can be set up so that, in my example, "client" could not change the rules. I've not
lookedto see if PG has anything native for this. But I dare say that somebody could implement an extension with a C
implementationto do something pretty useful in this space. 

Anyway... this kind of denial of service discussion is way outside the scope of what I addressed. I started with this

« The main point of a database is to store your data, to keep it in compliance with all the specified data rules, and
toallow authorized client-side code to modify the data by using only a set of specified business functions. » 

I should have added "preventing busy work that has no effect on the persisted data is out of scope".




Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Bryn Llewellyn
Дата:
> hjp-pgsql@hjp.at wrote:
>
>> rjuju123@gmail.com wrote:
>>
>>> bryn@yugabyte.com wrote:
>>>
>>> My demo seems to show that when a program connects as "client", it can perform exactly and only the database
operationsthat the database design specified. Am I missing something? In other words, can anybody show me a
vulnerability?
>>
>> What exactly prevents the client role from inserting e.g.
>>
>> - 'robert''); drop table students; --'
>
> It can do this but it won't do any harm since the client role doesn't have permission to drop the table.
>
>> - millions of 'cat' rows
>> - millions of 1GB-large rows
>
> That depends on "the database operations that the database design specified", but if the client role is supposed to
beable to insert data, you can't really prevent it from inserting non-sensical or enormous data. You can encapsulate
theinsert functionality in a function or procedure and do some sanity checks there. But automatically distinguishing
betweenlegitimate use and abuse is generally not simple. 
>
>> or just keep sending massive invalid query texts to fill the logs, or just trying to connect until there's no
availableconnection slots anymore, and then keep spamming the server thousands of time per second to try to open new
connections,or ...? 
>
> There are often several layers of defense. The database frequently won't be accessible from the open internet (or
eventhe company network) directly. Only a middle tier of application servers running vetted client code will connect
directly.Even those servers may not be accessible directly to end users. There may be a layer of proxy servers above
them.Each of these layers may implement additional checks, rate limits and monitoring. 

I'm afraid that I didn't see this from you until I'd already replied to Julien's turn in this thread. Sorry that I
causedthread divergence. Thanks, Peter, for addressing the contribution(s) that other tiers in the stack make (and
uniquelyare able to make) in order to deliver the intended application functionality to the end user. 


Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Julien Rouhaud
Дата:
On Tue, Sep 27, 2022 at 05:27:22PM -0700, Bryn Llewellyn wrote:
> > hjp-pgsql@hjp.at wrote:
> >
> >> rjuju123@gmail.com wrote:
> >>
> >>> bryn@yugabyte.com wrote:
> >>>
> >>> My demo seems to show that when a program connects as "client", it can
> >>> perform exactly and only the database operations that the database design
> >>> specified. Am I missing something? In other words, can anybody show me a
> >>> vulnerability?
> >>
> >> What exactly prevents the client role from inserting e.g.
> >>
> >> - 'robert''); drop table students; --'
> >
> > It can do this but it won't do any harm since the client role doesn't have
> > permission to drop the table.

FTR it's a reference to https://xkcd.com/327/

Both of you are saying it's harmless because you're assuming that only the
client role may read the data and act on it, but the whole point of SQL
injection is to try to do actions that the role you have access to can't
already do.  And that's just a few out of dozens of examples of how having a
role connected to the database can do harm.

> >
> >> - millions of 'cat' rows
> >> - millions of 1GB-large rows
> >
> > That depends on "the database operations that the database design
> > specified", but if the client role is supposed to be able to insert data,
> > you can't really prevent it from inserting non-sensical or enormous data.
> > You can encapsulate the insert functionality in a function or procedure and
> > do some sanity checks there. But automatically distinguishing between
> > legitimate use and abuse is generally not simple.

Which is exactly what was the idea behind Tom's "if you don't trust another
session that is running as your userID, you have already lost".

> >
> >> or just keep sending massive invalid query texts to fill the logs, or just
> >> trying to connect until there's no available connection slots anymore, and
> >> then keep spamming the server thousands of time per second to try to open
> >> new connections, or ...?
> >
> > There are often several layers of defense. The database frequently won't be
> > accessible from the open internet (or even the company network) directly.
> > Only a middle tier of application servers running vetted client code will
> > connect directly. Even those servers may not be accessible directly to end
> > users. There may be a layer of proxy servers above them. Each of these
> > layers may implement additional checks, rate limits and monitoring.

If no one has direct SQL access to the database, then there's no problem with a
role being able to pg_terminate_backend() session for the same role, and this
thread shouldn't exist to begin with.

> I'm afraid that I didn't see this from you until I'd already replied to
> Julien's turn in this thread. Sorry that I caused thread divergence. Thanks,
> Peter, for addressing the contribution(s) that other tiers in the stack make
> (and uniquely are able to make) in order to deliver the intended application
> functionality to the end user.

Your whole argument in your other email was:

> Anyway... this kind of denial of service discussion is way outside the scope
> of what I addressed.

which looks like in total contradiction with your original email:

> Am I missing something? In other words, can anybody show me a vulnerability?

Again, don't give SQL access to untrusted users and you will avoid a lot of
problems, including someone abusing pg_terminate_backend().



Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Bryn Llewellyn
Дата:
> rjuju123@gmail.com wrote:
>
>> bryn@yugabyte.com wrote:
>>
>>>> hjp-pgsql@hjp.at wrote:
>>>>
>>>>> rjuju123@gmail.com wrote:
>>>>>
>>>>>> bryn@yugabyte.com wrote:
>>>>>>
>>>>>> [Bryn] My demo seems to show that when a program connects as "client", it can perform exactly and only the
databaseoperations that the database design specified. Am I missing something? In other words, can anybody show me a
vulnerability?
>>>>>
>>>>> [Julien] What exactly prevents the client role from inserting e.g.
>>>>>
>>>>> - 'robert''); drop table students; --'
>>>>
>>>> [Peter] It can do this but it won't do any harm since the client role doesn't have permission to drop the table.
>
> [Julien] For the record. it's a reference to [XKCD's "Little Bobby Tables" cartoon]. Both of you are saying it's
harmlessbecause you're assuming that only the client role may read the data and act on it, but the whole point of SQL
injectionis to try to do actions that the role you have access to can't already do. And that's just a few out of dozens
ofexamples of how having a role connected to the database can do harm. 
>
>>>>> [Julien]
>>>>> - millions of 'cat' rows
>>>>> - millions of 1GB-large rows
>>>>
>>>> [Peter] That depends on "the database operations that the database design specified", but if the client role is
supposedto be able to insert data, you can't really prevent it from inserting non-sensical or enormous data. You can
encapsulatethe insert functionality in a function or procedure and do some sanity checks there. But automatically
distinguishingbetween legitimate use and abuse is generally not simple. 
>
> [Julien] Which is exactly what was the idea behind Tom's "if you don't trust another session that is running as your
userID,you have already lost". 
>
>>>>> [Julien] or just keep sending massive invalid query texts to fill the logs, or just trying to connect until
there'sno available connection slots anymore, and then keep spamming the server thousands of time per second to try to
opennew connections, or ...? 
>>>>
>>>> [Peter] There are often several layers of defense. The database frequently won't be accessible from the open
internet(or even the company network) directly. Only a middle tier of application servers running vetted client code
willconnect directly. Even those servers may not be accessible directly to end users. There may be a layer of proxy
serversabove them. Each of these layers may implement additional checks, rate limits and monitoring. 
>
> [Julien] If no one has direct SQL access to the database, then there's no problem with a role being able to
pg_terminate_backend()session for the same role, and this thread shouldn't exist to begin with. 
>
>> [Bryn] I'm afraid that I didn't see this from you until I'd already replied to Julien's turn in this thread. Sorry
thatI caused thread divergence. Thanks, Peter, for addressing the contribution(s) that other tiers in the stack make
(anduniquely are able to make) in order to deliver the intended application functionality to the end user. 
>
> [Julien] Your whole argument in your other email was:
>
>> [Bryn] Anyway... this kind of denial of service discussion is way outside the scope of what I addressed.
>
> [Julien] which looks like in total contradiction with your original email:
>
>> Am I missing something? In other words, can anybody show me a vulnerability?
>
> [Julien] Again, don't give SQL access to untrusted users and you will avoid a lot of problems, including someone
abusingpg_terminate_backend(). 

I fear that you and I, Julien, are talking past each other. That's probably my fault. Any maybe there's no hope of
rescuenow. 

My brand new thread, started here:

https://www.postgresql.org/message-id/3D119733-6784-4E84-98E4-5124E69D43F9@yugabyte.com

has nothing whatsoever to do with pg_terminate_backend(). The thread that I started here:

https://www.postgresql.org/message-id/10F360BB-3149-45E6-BFFE-10B9AE31F1A6@yugabyte.com

to ask about pg_terminate_backend() reached a nice "case closed" for me because I filled an embarrassing hole in my
understanding.I see now that, in a database of interest, I can revoke execute on the "killer" proc and grant it to any
rolethat needs it. Doing this is nothing other than following the principle of least privilege. It's liberating to know
that"Thing X" that you don't need can be made impossible. And so much nicer than applying the "what-about-ism"
approach:"Thing X" brings some risks. But so what? "Thing Y", and its legion cousins, bring risks too—so don't worry
aboutpreventing "Thing X". 

Now back to my new thread. I interpreted what Tom wrote to mean that he flatly rejected the idea that a database design
waspossible that prevented a client session that authorized as a role, that's designed for that purpose, from dropping
tablesand otherwise arbitrarily corrupting stuff. I expect that I completely misunderstood his point. But, anyway,
that'swhat I responded to. 

Now it seems that you, Julien, are not convinced that the code that I showed prevents a session that authorizes as
"client"from dropping the table, owned by "u1", where the data is. Nor are you convinced that a "client" session is
preventedfrom inserting mixed or upper case data, updating existing data, or deleting existing data. Rather (as your
BobbyTables reference indicates) you think that a cunning SQL injection attack can manage to do these bad things. 

Well... the challenge is yours now: prove your point with some working code.


Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Julien Rouhaud
Дата:
On Tue, Sep 27, 2022 at 07:29:39PM -0700, Bryn Llewellyn wrote:
>
> Now back to my new thread. I interpreted what Tom wrote to mean that he
> flatly rejected the idea that a database design was possible that prevented a
> client session that authorized as a role, that's designed for that purpose,
> from dropping tables and otherwise arbitrarily corrupting stuff. I expect
> that I completely misunderstood his point. But, anyway, that's what I
> responded to.
>
> Now it seems that you, Julien, are not convinced that the code that I showed
> prevents a session that authorizes as "client" from dropping the table, owned
> by "u1", where the data is. Nor are you convinced that a "client" session is
> prevented from inserting mixed or upper case data, updating existing data, or
> deleting existing data. Rather (as your Bobby Tables reference indicates) you
> think that a cunning SQL injection attack can manage to do these bad things.
>
> Well... the challenge is yours now: prove your point with some working code.

I'm convinced that that authorization system works as expected, what I'm not
convinced of is that the authorization system can prevent an untrusted user
with a direct SQL access from actually hurting you.

So yes in your case maybe the "client" role cannot drop the showed table, but
it can still insert nonsensical data, from a client point of view, or lead to
outage or other problems without any difficulty, and there's nothing in the
authorization system that can prevent that.

I'm also not convinced that your demo is proving anything, as "inserting any
only value made of non-uppercase characters in a single table " isn't really
representative of any basic application, especially without knowing what that
data will be used for.

The only case this example could make sense would be a log application, and
then a direct SQL access you can insert nonsensical or malicious data,
depending on what the application will do with those data (which could lead to
crash in the client application, or make it do thing it shouldn't do).



Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Bryn Llewellyn
Дата:
> rjuju123@gmail.com wrote:
>
> bryn@yugabyte.com wrote:
>>
>> Now back to my new thread. I interpreted what Tom wrote to mean that he flatly rejected the idea that a database
designwas possible that prevented a client session that authorized as a role, that's designed for that purpose, from
droppingtables and otherwise arbitrarily corrupting stuff. I expect that I completely misunderstood his point. But,
anyway,that's what I responded to. 
>>
>> Now it seems that you, Julien, are not convinced that the code that I showed prevents a session that authorizes as
"client"from dropping the table, owned by "u1", where the data is. Nor are you convinced that a "client" session is
preventedfrom inserting mixed or upper case data, updating existing data, or deleting existing data. Rather (as your
BobbyTables reference indicates) you think that a cunning SQL injection attack can manage to do these bad things. 
>>
>> Well... the challenge is yours now: prove your point with some working code.
>
> I'm convinced that that authorization system works as expected, what I'm not convinced of is that the authorization
systemcan prevent an untrusted user with a direct SQL access from actually hurting you. So yes in your case maybe the
"client"role cannot drop the showed table, but it can still insert nonsensical data, from a client point of view, or
leadto outage or other problems without any difficulty, and there's nothing in the authorization system that can
preventthat. 
>
> I'm also not convinced that your demo is proving anything, as "inserting any only value made of non-uppercase
charactersin a single table" isn't really representative of any basic application, especially without knowing what that
datawill be used for. 
>
> The only case this example could make sense would be a log application, and then a direct SQL access you can insert
nonsensicalor malicious data, depending on what the application will do with those data (which could lead to crash in
theclient application, or make it do thing it shouldn't do). 

My example wasn't meant in any way to be realistic. I'm sorry if I didn't make that clear from the outset. It was meant
onlyto illustrate the principles. For example, the "lower case only" rule was meant to be an example of *any* data
rule.Just like the write-once-read-many auto-generated surrogate primary key rule. Can you show me how those data
rules,unrealistic as you might think them to be, can be violated? 

> I'm not convinced... that the authorization system can prevent an untrusted user with a direct SQL access from
actuallyhurting you. 

What do you mean by "untrusted"? Any person who is given the credentials to start a database session is trusted—even a
personwho can connect as a superuser and do untold harm. So focus on a person who has the credentials to connect as
"client"in my example. But imagine a design that exposes functionality to "client" sessions exclusively through a
carefullydesigned and implemented API that's expressed exclusively with user-defined functions and procedures. And
choosesomething to model that meets your criteria for realism. Then show me, using a self-contained code example, how a
sessionthat authorized as "client" can cause the hurt that concerns you. Notice that "hurt" must be taken to mean
havingthe persistently stored data no longer satisfying  as specified business rule. And not anything to do with denial
ofservice based on unconstrained resource consumption. 

If, when I review it, I can see how to change the code to remove the vulnerability, then you'll have learned something.
Onthe other hand, if you can show me a vulnerability that cannot be fixed, then I'll have learned something! I'm
selfishlymore interested in that second outcome because my overall mental model will have been improved. 




Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Tom Lane
Дата:
Bryn Llewellyn <bryn@yugabyte.com> writes:
> rjuju123@gmail.com wrote:
>> I'm not convinced... that the authorization system can prevent an untrusted user with a direct SQL access from
actuallyhurting you. 

> What do you mean by "untrusted"? Any person who is given the credentials
> to start a database session is trusted—even a person who can connect as
> a superuser and do untold harm. So focus on a person who has the
> credentials to connect as "client" in my example. But imagine a design
> that exposes functionality to "client" sessions exclusively through a
> carefully designed and implemented API that's expressed exclusively with
> user-defined functions and procedures.

Sure.  That is called an application server.  What we are is a SQL server,
and that means that the API is SQL commands, and the authorization model
is what the SQL spec says it is.  (Well, okay, there's some discrepancies
between the letter of the spec and what we actually do.  But my main
point here is that any session authenticated as user X has the same
privileges as any other session authenticated as user X, so there is
not a lot of point in user X mounting defenses against user X.)

So I think the concerns you're expressing here would be better addressed
at the next level up the stack.

            regards, tom lane



Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Julien Rouhaud
Дата:
On Tue, Sep 27, 2022 at 08:47:52PM -0700, Bryn Llewellyn wrote:
>
> For example, the "lower case only" rule was meant to be an
> example of *any* data rule. Just like the write-once-read-many auto-generated
> surrogate primary key rule. Can you show me how those data rules, unrealistic
> as you might think them to be, can be violated?

An unprivileged role by definition can't ignore or drop table constraints,
that's not the problem here.  The discussion should be "is having a direct SQL
access to the same role as my carefully written application is using
problematic", and the answer is yes.

> > I'm not convinced... that the authorization system can prevent an untrusted
> > user with a direct SQL access from actually hurting you.
> 
> What do you mean by "untrusted"? Any person who is given the credentials to
> start a database session is trusted—even a person who can connect as a
> superuser and do untold harm. So focus on a person who has the credentials to
> connect as "client" in my example. But imagine a design that exposes
> functionality to "client" sessions exclusively through a carefully designed
> and implemented API that's expressed exclusively with user-defined functions
> and procedures. And choose something to model that meets your criteria for
> realism. Then show me, using a self-contained code example, how a session
> that authorized as "client" can cause the hurt that concerns you. Notice that
> "hurt" must be taken to mean having the persistently stored data no longer
> satisfying  as specified business rule. And not anything to do with denial of
> service based on unconstrained resource consumption.

You mean like if the application takes care of checking that the logged-in user
is allowed to insert data based on whatever application defined rules / user
profile, while the SQL role can simply insert data and/or call the carefully
written functions?  Yes the data will be consistent, but if your role just
transferred money from an account to another that's not really the problem.



Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
"Peter J. Holzer"
Дата:
On 2022-09-28 09:13:53 +0800, Julien Rouhaud wrote:
> On Tue, Sep 27, 2022 at 05:27:22PM -0700, Bryn Llewellyn wrote:
> > > hjp-pgsql@hjp.at wrote:
> > >> rjuju123@gmail.com wrote:
> > >>> bryn@yugabyte.com wrote:
> > >>> My demo seems to show that when a program connects as "client", it can
> > >>> perform exactly and only the database operations that the database design
> > >>> specified. Am I missing something? In other words, can anybody show me a
> > >>> vulnerability?
> > >>
> > >> What exactly prevents the client role from inserting e.g.
> > >>
> > >> - 'robert''); drop table students; --'
> > >
> > > It can do this but it won't do any harm since the client role doesn't have
> > > permission to drop the table.
>
> FTR it's a reference to https://xkcd.com/327/
>
> Both of you are saying it's harmless because you're assuming that only the
> client role may read the data and act on it, but the whole point of SQL
> injection is to try to do actions that the role you have access to can't
> already do.

No. The point of SQL injection is to get client code (typically some web
application, but could be anything not directly under the attacker's
control) to execute SQL which it wasn't designed to execute. So for
example the student management system in the cartoon may not contain a
"drop" statement anywhere in its code. But because of the SQL injection
it will execute one anyway.

This doesn't elevate the privileges of the client role. The attacker can
merely leverage privileges the client role already has (which may be
more than it strictly needs).

If the attacker has direct access to the database then SQL injection is
moot since they can just execute the SQL anyway.


> > > There are often several layers of defense. The database frequently won't be
> > > accessible from the open internet (or even the company network) directly.
> > > Only a middle tier of application servers running vetted client code will
> > > connect directly. Even those servers may not be accessible directly to end
> > > users. There may be a layer of proxy servers above them. Each of these
> > > layers may implement additional checks, rate limits and monitoring.
>
> If no one has direct SQL access to the database, then there's no problem with a
> role being able to pg_terminate_backend() session for the same role, and this
> thread shouldn't exist to begin with.

The attacker might use SQL injection to execute pg_terminate_backend().
But yes, as Tom already pointed out, if they can do that, that's
probably the least of your worries.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
"Peter J. Holzer"
Дата:
On 2022-09-27 19:29:39 -0700, Bryn Llewellyn wrote:
> I interpreted what Tom wrote to mean that he flatly rejected the idea
> that a database design was possible that prevented a client session
> that authorized as a role, that's designed for that purpose, from
> dropping tables and otherwise arbitrarily corrupting stuff. I expect
> that I completely misunderstood his point.

I think you did. What he was saying was that a database design which
uses only one single role which both owns all the objects and executes
all code on behalf of the user can not be secure. At the very least you
need two roles: One which owns the objects and one which can only use
the objects in a way allowed by the business logic and is not allowed to
change that logic (This is the case you demonstrated). In many cases
this should be even more fine-grained, and at the extreme end every user
could actually have several roles, each with only the minimal privileges
required.

(I have often observed that we tend to throw away and build permission
systems at every layer of the stack: The OS has a system of users and
permissions. But the database server runs as a single user (postgres)
which has access to all the data files. So it has to implement its own
system of roles and permissions. Then an application developer comes
along and writes an app which uses a single database role which has
access to all the data. So it again has to implement its own system of
users and permissions ...)

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Bryn Llewellyn
Дата:
tgl@sss.pgh.pa.us wrote:

bryn@yugabyte.com writes:

rjuju123@gmail.com wrote:

I'm not convinced... that the authorization system can prevent an untrusted user with a direct SQL access from actually hurting you.

What do you mean by "untrusted"? Any person who is given the credentials to start a database session is trusted—even a person who can connect as a superuser and do untold harm. So focus on a person who has the credentials to connect as "client" in my example. But imagine a design that exposes functionality to "client" sessions exclusively through a carefully designed and implemented API that's expressed exclusively with user-defined functions and procedures.

Sure. That is called an application server. What we are is a SQL server, and that means that the API is SQL commands, and the authorization model is what the SQL spec says it is. [So]... any session authenticated as user X has the same privileges as any other session authenticated as user X, so there is not a lot of point in user X mounting defenses against user X. So I think the concerns you're expressing here would be better addressed at the next level up the stack.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Julien replied to the same message from me that Tom did thus:

rjuju123@gmail.com wrote 

You mean like if the application takes care of checking that the logged-in user is allowed to insert data based on whatever application defined rules / user profile, while the SQL role can simply insert data and/or call the carefully written functions? Yes the data will be consistent, but if your role just transferred money from an account to another that's not really the problem.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Peter Holzer replied to an earlier message fro be in this branching thread thus:

hjp-pgsql@hjp.at wrote:

I think you did [misunderstand Tom]. What he was saying was that a database design which uses only one single role which both owns all the objects and executes all code on behalf of the user can not be secure. At the very least you need two roles: One which owns the objects and one which can only use the objects in a way allowed by the business logic and is not allowed to change that logic (This is the case you demonstrated.)

Yes, indeed it is. That was my aim.

In many cases this should be even more fine-grained, and at the extreme end every user could actually have several roles, each with only the minimal privileges required.

(I have often observed that we tend to throw away and build permission systems at every layer of the stack: The OS has a system of users and permissions. But the database server runs as a single user (postgres) which has access to all the data files. So it has to implement its own system of roles and permissions. Then an application developer comes along and writes an app which uses a single database role which has access to all the data.

It often is done like this. But it doesn't have to be done this way. This is where a user-defined within-RDBMS functions and procedures, and so-called "run authority" in the wider discourse, come to the rescue.

So it again has to implement its own system of users and permissions...)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

*SUMMARY*

I'd better simply attempt to stop further discussion in this thread by saying that no mutual understanding has been reached. I'm sad about this outcome. But it would seem to be unfair to invite anybody to spend any more time on these exchanges.

*MORE... ONLY IF YOU CAN BEAR TO READ IT*

First, lest a really awful misunderstanding lingers on, I must stress that, about this from Tom:

any session authenticated as user X has the same privileges as any other session authenticated as user X

well, yes. That statement of outcome is a tautology. I don't think that I wrote anything to say that I thought otherwise. About the notion that Ms. Jones, authorized as X can kill Ms. Smith's session authorized as X, and vice versa... this seems now to be just a red herring. Either the designer wants this or they  don't. And if they don't, they can prevent it by a simple revocation of one object privilege. It's been argued that some other evil doing can't be prevented. Well, fair enough. But this doesn't seem to justify not preventing any bad thing that *can* be prevented. 

There was an era, now long gone, when an outfit had its own production RDBMS (or a small few of these) in a locked and guarded room in the very same building within which all human end users of such a system worked. In those days, each human user was mapped one-to-one to its own within-RDBMS role. And the application under the end-users fingertips connected directly to the RDBMS by authenticating as that person-specific RDBMS role. (There'd also be some RDBMS roles for human admins and developers.) Then, no two end-user sessions would authorize as the same role, and native RDBMS security notions were relied upon to protect in the way that's come up for discussion here.

We're no longer in that era. Client-server is dead. Long-live multi-tier. I have a few bank accounts. And I also do Internet shopping. I went through the usual steps to set myself up to use each of these. And I have no choice but to put my faith in their authorization and security schemes—notwithstanding periodic troubling revelations of security loopholes. I've also discussed such systems with end-user organizations as part of the duties of my former job. Of course, these end-user organizations were self-selected for me because they all used an RDBMS as the database of record at the bottom of it all.

None of these end-user organizations mapped human end users each to their own RDBMS role. We all know this. And we all know the reasons. The concept of connection pooling, where all connections in the pool authorize as the same RDBMS role is just one reason among many. I've called this connection-pool-role "client"—and I'll continue to do so here.

This pre-amble sets the stage for the discussion about the overall security proposition when two sessions authorize as the same database role "client" (by all means, not even concurrently) but on behalf of different human end-users. I've not had the chance yet to have even a single face-to-face discussion with someone who knows what approaches are used to bring the required isolation between human end-users when the database of record is PostgreSQL. And I'd very much like to read some nicely-written essays about this topic. Can anybody recommend any such pieces for me to study?

Meanwhile, I do understand the general principles. Software tiers that stand above the RDBMS tier in the overall stack take care of authorizing a human being so that the SQL call that the application code tier passes through the connection pooling tier on behalf of the human user also passes a unique ID for the human. The SQL execution therefore knows whom it's acting for—and can look up salient facts about this principal in table(s) that have been designed and populated for that purpose. The details are unimportant here. We have no end of existence proofs that such schemes are in daily use and are judged to be adequately robust. (For example, Salesforce.com have talked and written publicly about how they do application-level multitenancy within a single RDBMS by striping the critical tables with an ID for the tenant organization. Then they model human users within tenant organizations on top of that.)

(Yes, Peter, I appreciate that this is exactly what you said! But it's hardly a bad thing. It's just an inevitable fact of life.)

Paraphrasing Peter, the design of the application's RDBMS backend has to implement its own notions of roles and privileges as a new layer on top of whatever the native RDBMS mechanisms provide. Some RDBMSs have native primitives that help the implementation of this next, application-specific, roles-and-privileges regime. I've yet to discover what PG has in this space. But one thing is clear. The SQLs that rely on these home-grown designs for the application-specific privilege regime must be designed, implemented, and tested very carefully by specialists. In my book, these specialists have enough to grapple with without embracing the skills of client-side programming languages, the declarative mechanisms that various frameworks rely on, and UI design. It's nice that PG has the mechanisms to support this separation of duties.

All this background lets me focus on what I started with: how to make use of PG's native mechanisms for security and (within this framework) "security definer" subprograms to limit the scope for what SQL calls from next-tier code that's authorized as "client" can do. The design spectrum here spans from this:

—Make no use of the RDBMS's mechanisms: in other words, implement the entire RDBMS application backend using only the single role "client" where client owns, crudely stated, only tables. Of course, there are closely associated artifacts like indexes and constraints—even though (as the myth has it) these cause performance problems and, as is claimed, such rules are better implemented in application-tier code. I kid you not. I've heard people say this. And I expect that you all have too. In this case, the engineers who write the application-tier code (or use schemes that generate it) take all the responsibility for correctness, data integrity, and data security. I call this approach the "bag of tables paradigm". And, yes, the phrase is meant to connote nastiness.

—Make maximal use of the RDBMS's mechanisms: in other words, exploit PG's role-based security mechanisms, and especially "security definer" subprograms, so that application-tier code (acting as it must, by having authorized as "client" to service requests for all human end-users) can most reliably implement whatever scheme is invented for the within-RDBMS component of the application-level notions for roles and privileges. I call this approach the "hard shell paradigm". (I didn't invent that term of art.) The phrase is meant to connote goodness.

You've noticed that I stand at the "hard shell" end of the spectrum. For example, I have a demo application that deals with data held in a master-detail table pair. (Please don't tell me, Julien, that this is such an unrealistic toy that it makes not a single pedagogic point.) The API is exposed via the "client" role as just a set of JSON-in, JSON-out procedures. And there's many roles where often each owns more than one schema, hidden behind the scenes. Sessions authorized as "client" can't detect any facts about these implementation details because they can't query any of the catalog relations. After all, why should they be able to do this? I see no need for it. And it's easy to prevent it. Moreover, and very significantly, it's hugely more difficult to allow catalog access and then to reason, case by case, about what "client" can see (like the source code of a procedure) can't be exploited to do harm. 

Here's some examples of API calls with their return values. Each of the "masters" and the "details" tables has just a single "payload" column, "v text" with some constraints. The other columns are the master-detail plumbing and don't surface into the business functionality world.

Firstly, inserting data:

call insert_master_and_details()

{"m": "mary", "ds": ["shampoo", "soap", "toothbrush", "towel"]}

→ {"status": "success"}


{"m": "arthur", "ds": ["scissors", "saucer", "spatula", "spatula", "scissors"]}

→ {"reason": "New master 'arthur' bad duplicate details: 'scissors','spatula',", "status": "user error"}


{"m": "Christopher", "ds": []}

→ {"status": "unexpected error", "ticket": 1}

The "Christopher" error reflects the fact that a rule (like names must be all lower case) is violated. The API contract says that the database will reject bad values—and will not coerce a bad value into a good value. Rather, it's the application-tier's responsibility to present only good values. That's why the status is "unexpected error". The ticket number is the auto-generated surrogate key for a row in the behind-the-scenes "incidents" table. An authorized support staff member can access that table for that ticket and see this:

unit:                 procedure code.insert_master_and_details(text, text)
returned_sqlstate:    23514
message_text:         new row for relation "masters" violates check constraint "masters_v_chk"
pg_exception_detail:  Failing row contains (cc93bd34-b68a-4d47-b9e9-0033031cefb7, Christopher).
constraint_name:      masters_v_chk
table_name:           masters
schema_name:          data

pg_exception_context
--------------------
SQL statement "insert into data.masters(v) values(m_and_ds.m) returning mk"
PL/pgSQL function code.insert_master_and_details(text,text) line 17 at SQL statement
SQL statement "call code.insert_master_and_details(j, outcome)"
PL/pgSQL function insert_master_and_details(text,text) line 3 at CALL

This is the full story of what "get stacked diagnostics" delivers in an "others" handler—together with whatever other facts about the context that the designer as decided might be useful (exemplified here by the redundant fact "unit").

Secondly, reporting on data:

call do_master_and_details_report()

{"key": "mary"}

→ {"status": "m-and-ds report success",
   "m_and_ds": {"m": "mary", "ds": ["shampoo", "soap", "toothbrush", "towel"]}}

{"key": "bill"}

→ {"reason": "The master business key 'bill' doesn't exist", "status": "user error"}

{"ket": "fred"}

→ {"reason": "Bad JSON in: {\"ket\": \"fred\"}", "status": "client code error"}

Obviously, it's the responsibility of the application-tier code to manage the UI flow, paint panes that allow data entry for new facts and for parameterizing queries, and for presenting the return values for the good outcomes and the error outcomes sensibly.

For sport, I have a variant of this demo that enforces the famous mandatory 1:M rule. I used an approach that you can find described on the Internet. The concept is a mutual FK relationship between the "masters" and the "details" tables so that one detail is special in that it's the parent of its own parent. This works perfectly well w.r.t. the rule itself—and I don't need to worry about race conditions because the native support for deferred FK constraints looks after this. There is a twist, though. I have to do a "twizzle" when there remain many details for a given master and when somebody wants to delete the currently "special" detail. But the solution is doable by using a trigger to perform the twizzle to appoint a surviving detail to "special" status.

Significantly, the details of the enforcement scheme don't matter to sessions that connect as "client". To prove that point, I implemented an alternative scheme that uses a trigger naïvely to cause an error when it sees that you're about to leave a master row with no details. This scheme does, of course, require "serializable" isolation—and so I prefer the mutual FK scheme.

My point here is that the design and implementation of the mandatory 1:M rule enforcement, whatever it is, is entirely hidden from "client" sessions and is immune from their tampering attempts. Moreover, you can change the implementation, in an application upgrade exercise, without "client" sessions even knowing that this was done. (I like to dream that, one day, it could be replaced with a single declaration of intent, expressed as a "create assertion" statement.)

I regard this feature of the scheme (changing the implementation without needing to tell clients) as a good thing. It's something that the "bag of tables" paradigm cannot, in general, support. When you describe the paradigm as "hiding all implementations behind an API that articulates the business purpose of each of its members", you see the "hard shell" paradigm for what it is. One of the central pillars of software engineering. And it's as old as the discipline itself.

*FINALLY*

I've discussed these competing paradigms, over many years, with all sorts of people—both face-to-face and in written exchanges. I'm a "hard shell" devotee. And there are lots of "hard shell" devotees out there. But there are also many "bag of tables" devotees out there too. The second camp is probably more populous than the first one. I've come to accept that one's affiliation here is religion. In almost all cases, a devotee of one paradigm simply cannot convince devotees of the other paradigm to change their minds—or even to concede ground on just one tiny detail. I can't fix this. But I know where I stand in the debate.

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
"Peter J. Holzer"
Дата:
On 2022-09-29 20:24:59 -0700, Bryn Llewellyn wrote:
> Paraphrasing Peter, the design of the application's RDBMS backend has to
> implement its own notions of roles and privileges as a new layer on top of
> whatever the native RDBMS mechanisms provide. Some RDBMSs have native
> primitives that help the implementation of this next, application-specific,
> roles-and-privileges regime.

Can you provide a link to documentation (hopefully high-level
and concise) of such a system? I'm having a hard time imagining one
which I wouldn't either classify as "database roles by another name" or
"just data".

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
Bryn Llewellyn
Дата:
hjp-pgsql@hjp.at wrote:


Paraphrasing Peter, the design of the application's RDBMS backend has to implement its own notions of roles and privileges as a new layer on top of whatever the native RDBMS mechanisms provide. Some RDBMSs have native primitives that help the implementation of this next, application-specific, roles-and-privileges regime.

Can you provide a link to documentation (hopefully high-level and concise) of such a system? I'm having a hard time imagining one which I wouldn't either classify as "database roles by another name" or "just data".

I fear that we might be about to start another round of mutual misunderstanding. I’m not confident that I understand the question.

I believe that you want x-refs to accounts of native RDBMS features that let you implement application-level specific notions of security on top of the basic roles and privileges features and that are oriented to the case where a single RDBMS role, "client", must expose differentiated functionality to different human end-users—where these principals are identified by, and authorized by, system(s) outside of the RDBMS in question. In such a world, the RDBMS gets a suitable ID for the human (by all means suitably encrypted) and can use this to advantage by consulting its own representation of the current human's identity.

If my guess about your question is wrong, then I'm wasting my time. But I'll carry on anyway.

This is the canonical example:

—A manager must be able to see the salaries of all employees in the reporting hierarchy under her/him—but not be able to see salaries outside of that subtree.

Imagine the usual "employees" table with the pig's ear "manager_id" FK. But add an extra column for each employees external-to-the-database unique ID (unless this is already used as the table's PK).

PG has a native feature for this: row level security (hereinafter RLS). You can Google for blog posts about this use case. Here's one by a colleague of mine, Franck:


It uses the multi-tenant use case rather than my "managers-and-salaries" one. But the principles are what matter. Never mind the detail. The critical table(s) are striped with "tenant_id". And this is passed from the application tier to the PG tier with:

set rls.tenant_id=42;

Using the "hard shell" paradigm, "rls.tenant_id" could be a formal argument to an API subprogram. And its within-database representation could be a one column, one row temporary table with "on commit delete rows".

Franck's PG RLS  policies are created like this:

create policy... on ... for... using(tenant_id=current_setting('rls.tenant_id')::int);

The argument of "using()" is just an arbitrary SQL expression that evaluates to a boolean.


So in the "managers-and-salaries" use case, it could be "employee_id in (<a suitable query that uses a recursive CTE>)".

I'm afraid that I don't know what you mean by « classify as "database roles by another name" or "just data" ». For me, the RLS scheme is simply the native PG feature that you use to get the result that you want. But the context of these examples is very definitely a connection pooling regime that uses the single role "client" on behalf of (very) many different human users.

Finally, in case your question sought an account of some real-world scheme in this general use case space, a bit of Googling for  « Salesforce multi-tenancy » got me this:

Platform Multitenant Architecture

It's public knowledge that they use Oracle Database (or at least did—I haven't thought about them recently). But the general principles have their equivalents in PG too.

Salesforce engineers have also talked at Oracle User group conferences about how they use PL/SQL. Their main multi-tenant implementation (at least at the time of these talks) followed the "hard shell" paradigm.

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

От
"Peter J. Holzer"
Дата:
On 2022-09-30 17:59:01 -0700, Bryn Llewellyn wrote:
>     hjp-pgsql@hjp.at wrote:
>         bryn@yugabyte.com wrote:
>         Paraphrasing Peter, the design of the application's RDBMS backend has
>         to implement its own notions of roles and privileges as a new layer on
>         top of whatever the native RDBMS mechanisms provide. Some RDBMSs have
>         native primitives that help the implementation of this next,
>         application-specific, roles-and-privileges regime.
>
>     Can you provide a link to documentation (hopefully high-level and concise)
>     of such a system? I'm having a hard time imagining one which I wouldn't
>     either classify as "database roles by another name" or "just data".
>
>
> I fear that we might be about to start another round of mutual
> misunderstanding. I’m not confident that I understand the question.
>
> I believe that you want x-refs to accounts of native RDBMS features that let
> you implement application-level specific notions of security on top of the
> basic roles and privileges features and that are oriented to the case where a
> single RDBMS role, "client", must expose differentiated functionality to
> different human end-users—where these principals are identified by, and
> authorized by, system(s) outside of the RDBMS in question.

I the context of this thread, I don't "want" anything (what I actually
want may change from project to project). I am not familiar with the
"native primitives" you mentioned, so I would like to read up on them.
So if you can just drop a few names I can feed them to my favourite
search engine. An overview article which explains the concept and how
the primitives are used would be better of course, but if you don't have
any at hand, no problem.

Background: I have used Oracle, MySQL/MariaDB and PostgreSQL enough that
I consider myself to be quite familiar with their capabilities (my
Oracle skills are getting rusty, though). I have occasionally used
and/or read about other RDBMSs, but my knowledge of those is very
spotty.


> Franck's PG RLS  policies are created like this:
>
> create policy... on ... for... using(tenant_id=current_setting
> ('rls.tenant_id')::int);

So, I think the intermediate concept here which is neither role nor data
is the use of a run-time parameter.

He's not using a database role and he's not using a parameter which has
to be supplied to every query by the application programmer, but a
run-time parameter which would presumably be set once at the beginning
of a session or transaction (depending on whether you use connection
pooling). That's clever. Not sure if I would actually use it but it's
certainly something I'll add to my bag of tools.

> I'm afraid that I don't know what you mean by « classify as "database roles by
> another name" or "just data" ». For me, the RLS scheme is simply the native PG
> feature that you use to get the result that you want.

I meant what do you use to identify the user. The mechanism in
PostgreSQL intended for this are roles. You can use roles in RLS (in
fact I don't think I've ever seen an example which didn't use roles).

In contrast to this many systems (e.g. Django, Typo3, several I've
designed myself) use a table to keep their own list of users (and
possibly groups), and then check for every access whether the user
logged into the application has access by joining with that table. That
join may be explicit in the application code or it may be hidden behind
a view or a function. But the database doesn't know about that user.
It's just another parameter passed in by the application - "just data".

Using a run-time parameter is somewhere in between. The database still
doesn't know what it means (so in some sense it's still "just data"),
but it will keep the value for the duration of the transaction or
session, so the application gets sort of an "authorize once, then forget
about it" abstraction which puts it closer to the "roles by another
name" camp.


> Platform Multitenant Architecture
> https://architect.salesforce.com/fundamentals/platform-multitenant-architecture

That's *too* high-level for me. There's any number of techniques which
could be used to implement something like that and I don't see how they
actually did it (maybe I missed it - I admit I only skimmed the
article).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения
On 2022-09-30 17:59:01 -0700, Bryn Llewellyn wrote:
> set rls.tenant_id=42;

This works because there is a "." in the name. Without the "."
PostgreSQL complains:

hjp=> set rls_tenant_id=42;
ERROR:  unrecognized configuration parameter "rls_tenant_id"


I think I sort of knew this but had forgotten about it, so checked the
manual for the exact rules. Unfortunately I couldn't find them (I
checked https://www.postgresql.org/docs/14/sql-set.html,
https://www.postgresql.org/docs/14/config-setting.html,
https://www.postgresql.org/docs/14/functions-admin.html#FUNCTIONS-ADMIN-SET,
and browser the table of content and the index).

From the documentation I get the impression that you can only set
existing parameters, not your own.

I therefore suggest adding something like this to the section
"configuration_parameter" in
https://www.postgresql.org/docs/14/sql-set.html:

| In addition, any name including a single dot (".") can be set. This
| allows an application to define its own run-time parameters. Using the
| application name as a prefix reduces the probability of name conflicts
| with other applications or extensions.

(This of course assumes that the behaviour is intentional and not a
bug.)

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения
Hi,

On Sat, Oct 01, 2022 at 02:05:53PM +0200, Peter J. Holzer wrote:
> On 2022-09-30 17:59:01 -0700, Bryn Llewellyn wrote:
> > set rls.tenant_id=42;
>
> This works because there is a "." in the name. Without the "."
> PostgreSQL complains:
>
> hjp=> set rls_tenant_id=42;
> ERROR:  unrecognized configuration parameter "rls_tenant_id"
>
> I think I sort of knew this but had forgotten about it, so checked the
> manual for the exact rules. Unfortunately I couldn't find them (I
> checked https://www.postgresql.org/docs/14/sql-set.html,
> https://www.postgresql.org/docs/14/config-setting.html,
> https://www.postgresql.org/docs/14/functions-admin.html#FUNCTIONS-ADMIN-SET,
> and browser the table of content and the index).
>
> From the documentation I get the impression that you can only set
> existing parameters, not your own.
>
> I therefore suggest adding something like this to the section
> "configuration_parameter" in
> https://www.postgresql.org/docs/14/sql-set.html:
>
> | In addition, any name including a single dot (".") can be set. This
> | allows an application to define its own run-time parameters. Using the
> | application name as a prefix reduces the probability of name conflicts
> | with other applications or extensions.
>
> (This of course assumes that the behaviour is intentional and not a
> bug.)

This is intentional, and documented at
https://www.postgresql.org/docs/14/runtime-config-custom.html.



On 2022-10-01 20:24:21 +0800, Julien Rouhaud wrote:
> On Sat, Oct 01, 2022 at 02:05:53PM +0200, Peter J. Holzer wrote:
> > On 2022-09-30 17:59:01 -0700, Bryn Llewellyn wrote:
> > > set rls.tenant_id=42;
> >
> > This works because there is a "." in the name. Without the "."
> > PostgreSQL complains:
> >
> > hjp=> set rls_tenant_id=42;
> > ERROR:  unrecognized configuration parameter "rls_tenant_id"
> >
> > I think I sort of knew this but had forgotten about it, so checked the
> > manual for the exact rules. Unfortunately I couldn't find them
[...]
> > From the documentation I get the impression that you can only set
> > existing parameters, not your own.
> >
> > I therefore suggest adding something like this to the section
> > "configuration_parameter" in
> > https://www.postgresql.org/docs/14/sql-set.html:
> >
[...]
> > (This of course assumes that the behaviour is intentional and not a
> > bug.)
>
> This is intentional, and documented at
> https://www.postgresql.org/docs/14/runtime-config-custom.html.

So it's there but hard to find (I did read the table of contents at
https://www.postgresql.org/docs/14/runtime-config.html but I guess after
all the subchapters about specific configuration parameters I didn't
notice the one generic subchapter near the end).

I think referring to that chapter from
https://www.postgresql.org/docs/14/sql-set.html and possibly other
places would help.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения
"Peter J. Holzer" <hjp-pgsql@hjp.at> writes:
> On 2022-10-01 20:24:21 +0800, Julien Rouhaud wrote:
>> This is intentional, and documented at
>> https://www.postgresql.org/docs/14/runtime-config-custom.html.

> So it's there but hard to find (I did read the table of contents at
> https://www.postgresql.org/docs/14/runtime-config.html but I guess after
> all the subchapters about specific configuration parameters I didn't
> notice the one generic subchapter near the end).
> I think referring to that chapter from
> https://www.postgresql.org/docs/14/sql-set.html and possibly other
> places would help.

Up to now, there's been an intentional policy of not documenting
this very prominently, because doing so would encourage people
to abuse such variables as application state variables.  I say
"abuse" because the code supporting such variables isn't really
designed to support lots of them.

There has been a patch in the works for a long time to provide a
better mechanism for application state variables [1].  Separately,
I did some work recently towards alleviating the performance
problems with lots of custom parameters [2].  It appears from
the discussion in [2] that we're probably going to stick to the
policy of not encouraging use of custom parameters as application
state, although if [1] keeps failing to land maybe that will change?

            regards, tom lane


[1] https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4=6SvPe7ke6xPbh=K50LUAOwjC3jm8Me9Obg@mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/2982579.1662416866@sss.pgh.pa.us



tgl@sss.pgh.pa.us wrote:

Up to now, there's been an intentional policy of not documenting

«
20.16. Customized Options
»

very prominently[*], because doing so would encourage people to abuse such variables as application state variables. I say "abuse" because the code supporting such variables isn't really designed to support lots of them.

I hinted at a different approach in an earlier turn in this thread:


I sketched only how you might handle the case where the session state is just a single value—by using a one-row, one-column temporary table with "on commit delete rows". But the general approach is to use a two column temporary table for key-value pairs. This approach is what the PG doc sketches here:

«
43.13. Porting from Oracle PL/SQL
Since there are no packages, there are no package-level variables either. This is somewhat annoying. You can keep per-session state in temporary tables instead.
»

(That article of faith, "there are no packages and there never, ever will be", saddens me.)

Because PG has no event trigger that fires on session creation (why is this?), I've settled on this optimistic pattern:

begin
  insert into pg_temp.flag(val) values(true);
exception when undefined_table then
  get stacked diagnostics msg = message_text;
  if msg != 'relation "pg_temp.flag" does not exist' then
    raise;
  else
    create temp table pg_temp.flag(val boolean not null) on commit delete rows;
    insert into pg_temp.flag(val) values(true);
  end if;
end;

The code would need to be more elaborate (and use "upsert") for key-value pairs. But that's easy to do.

Do the experts on this list disapprove of this pattern and prefer (for a future regime) something like the Pavel Stehule scheme that Tom mentioned?

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

[*] I didn't know that there was a PG doc policy sometimes not to call out a bad practice but, rather, to hide away (in an obscure backwater in the docs) the account of a feature that’s considered to be better avoided except in special cases. This effectively hides it from Google search (and similar) too because of the circular notion that few people find it, and fewer still publish pages that include the link,… and so on.

I suppose that calling the thing an "option" while the doc for the "set" SQL statement uses the term of art "run-time parameter" is another “bad practice admonition by obscurity” notion. (I've referred to the thing as a "user-defined run-time parameter" in informal emails to colleagues. But that is a lot of syllables.)
Bryn Llewellyn <bryn@yugabyte.com> writes:
>   get stacked diagnostics msg = message_text;
>   if msg != 'relation "pg_temp.flag" does not exist' then

This is pretty fragile --- eg, it will fall over with translated
messages.  I think you could presume that if the error condition
name is undefined_table then you know what to do.

            regards, tom lane



tgl@sss.pgh.pa.us wrote:

bryn@yugabyte.com writes:

 get stacked diagnostics msg = message_text;
 if msg != 'relation "pg_temp.flag" does not exist' then

This is pretty fragile --- eg, it will fall over with translated messages. I think you could presume that if the error condition name is undefined_table then you know what to do.

Mea culpa. I should have stressed that my code was meant to be a sketch rather than the real thing. So my use of "on commit delete rows" suits the "hard shell paradigm" that I described here:


where the client code does:

  check out connection
  call a user-defined API subprogram
  release connection

and where I therefore want automatic check-out-duration session state.

In a different use case, I'd want session-duration session state. There. I'd use "on commit preserve rows".

About testing what "message_text" from "get stacked diagnostics msg" returns, yes… of course its sensitivity to the current choice of national language is a non-starter. I don't like to assume more than I have to. So I might say this:

  if msg !~ '"pg_temp.flag"' then

But, then again, I might decide that it's just too fussy.

I've seen this pattern in use:

  create temp table if not exists pg_temp.flag(val boolean not null) on commit delete rows;
  insert into pg_temp.flag(val) values(true);

But doing a DDL before every use of the session-state representation felt heavier than assuming that it's there and creating the table only if it isn't. But I haven't done any timing tests. Is the "create… if not exists" so lightweight when the to-be-created object does exist that I'm fussing over nothing?

Bryn Llewellyn <bryn@yugabyte.com> writes:
> I've seen this pattern in use:

>   create temp table if not exists pg_temp.flag(val boolean not null) on commit delete rows;
>   insert into pg_temp.flag(val) values(true);

> But doing a DDL before every use of the session-state representation felt heavier than assuming that it's there and
creatingthe table only if it isn't. But I haven't done any timing tests. Is the "create… if not exists" so lightweight
whenthe to-be-created object does exist that I'm fussing over nothing? 

Fair question.  My gut feeling is that the subtransaction created by the
BEGIN ... EXCEPTION construct is more expensive than a no-op CREATE
IF NOT EXISTS.  I've not measured it though; and I'm pretty sure that
the answer would vary depending on how often you expect the code to fall
through versus needing to create the table.

            regards, tom lane