Re: Row level security

Поиск
Список
Период
Сортировка
От Andrew Okhmat
Тема Re: Row level security
Дата
Msg-id d094a87d-9d63-46c9-8c27-631f881b80fb@supportex.net
обсуждение исходный текст
Ответ на Row level security  (Thomas Simpson <ts@talentstack.to>)
Список pgsql-performance
Hello Jim,

Your approach of using table inheritance in PostgreSQL for implementing row-level security (RLS) has some interesting aspects, but there are potential pitfalls and alternatives that you should consider. Below, I'll outline some key points to

Table Inheritance and Performance Concerns

* RLS and Inheritance -  In PostgreSQL, RLS policies are applied per table. If you use inheritance, RLS policies defined on the parent table won’t automatically apply to the child tables. You’ll have to set up RLS policies on each child table separately.

* Growing Base Table  - The base table, getting a new row for every row inserted in the child tables, will grow really fast. Managing a table with hundreds of millions of rows per year could become a serious performance problem.

* Partitioning - Partitioning can help manage big tables by breaking them into smaller parts. But if your base table becomes a bottleneck, partitioning the child tables alone might not solve the problem.

Alternative Approach: Use Partitioned Tables Directly with RLS

Given your needs, here's a different approach that leverages PostgreSQL's partitioning and indexing features along with RLS:

* Directly Partitioned Tables - Instead of inheritance, create partitioned tables directly for each type of data. Partition these tables based on a logical key (like time, site ID, or customer ID) so each partition stays manageable:
Example:
     CREATE TABLE data (
         id SERIAL PRIMARY KEY,
         site_id INT,
         customer_id INT,
         division_id INT,
         department_id INT,
         data_payload JSONB,
         created_at TIMESTAMPTZ
     ) PARTITION BY RANGE (created_at);



* RLS Policies on Partitions - Set up RLS policies on each partition. Since partitions are smaller, RLS policy checks should be more efficient.
Example:
     CREATE POLICY rls_policy ON data
     USING (site_id = current_setting('app.current_site_id')::INT);
     ENABLE ROW LEVEL SECURITY;


* Session Variables - Using PostgreSQL session variables to store user-specific info (like app.current_site_id) is convenient, but has potential security risks. If a client can set these variables, they could manipulate them to gain unauthorized access. To mitigate this, ensure that only trusted parts of your application can set these variables. Consider using server-side functions or application logic to securely set these variables based on the authenticated user's information.
Example:
     SET app.current_site_id = '123';

* Indexing  - Make sure you index columns used in RLS policies and queries, like site_id and customer_id.
     CREATE INDEX idx_site_id ON data (site_id);
     CREATE INDEX idx_customer_id ON data (customer_id);
     CREATE INDEX idx_site_customer ON data (site_id, customer_id);
     CREATE INDEX idx_created_at ON data (created_at);

* Using Stored Procedures - using stored procedures can centralize security logic, but it can also add complexity. Here's a brief look:

Advantages:
   - Centralized security logic.
   - Additional layer of security as logic is hidden from end-users.
   - Can include data validation and business logic.

2Disadvantages:
   - Increased complexity in development and maintenance.
   - Potential performance overhead for complex procedures.
   - Less flexibility for ad-hoc queries.

Example:
   CREATE OR REPLACE FUNCTION insert_data(
       p_site_id INT,
       p_customer_id INT,
       p_division_id INT,
       p_department_id INT,
       p_data_payload JSONB,
       p_created_at TIMESTAMPTZ
   ) RETURNS VOID AS $$
   BEGIN
       IF current_setting('app.current_site_id')::INT = p_site_id THEN
           INSERT INTO data (site_id, customer_id, division_id, department_id, data_payload, created_at)
           VALUES (p_site_id, p_customer_id, p_division_id, p_department_id, p_data_payload, p_created_at);
       ELSE
           RAISE EXCEPTION 'Access Denied';
       END IF;
   END;
   $$ LANGUAGE plpgsql;



* Final Thoughts

Using direct partitioning and applying RLS policies to each partition should help with performance issues linked to a growing base table. This approach also keeps things flexible for future expansions and avoids the hassle of managing inheritance hierarchies. Proper indexing with RLS policies in mind can greatly improve query performance in large tables. Just make sure to handle session variables securely to avoid potential security issues.

If you have more questions or need further advice on implementation, just let me know!

Cheers,  
Andy

On 25-Jun-24 00:28, Thomas Simpson wrote:

Hi,

I'm trying to implement a system which requires row level security on some key data tables (most do not require RLS).  The data tables will grow substantially (rows likely > +100M/year - the system is > 80% data insert plus < 20% updates and by design, no deletes).

Some queries are likely to brush past many rows before being eliminated by the RLS policy, so I'm trying to find the most efficient way that does not compromise query times.  I also want to have a unified approach across all the RLS data to make the policy implementation as straightforward as possible too because I know there will be future expansion of the RLS rules.

My thought currently is that table inheritance could possibly be one way forward.  Specifically the base table holding just the RLS attributes, such as site group, site ID, customer group, customer ID as some initial examples (I expect company division, department may be future needs too).

With the RLS attributes on the base table, I can add future needs to that table and they automatically propagate to the child tables holding the RLS data.  Policies on the child tables can enforce row visibility based on session tokens assigned at login (a future problem avoided just now for simplicity).

I have a small prototype working, with the policy function comparing the columns (from the base table) to the user tokens to permit/deny row access.  This allows this to be as in-memory and hopefully as fast as possible as it avoids needing to do any lookups to other tables or anything more expensive than some 'permissionColumn IN listOfTokensHeldByTheSession' checks.

My concern is the base table will grow substantially faster than the child data tables as that receives a new row for every row inserted in any of the child tables, so could easily be +300M rows/year and this could become some performance fence.  Some of the child tables have a clear partition key available so inherited & partitioned is also appealing but could possibly amplify any performance issue further.

Does this approach sound viable or are there pitfalls or a different more recommended approach?

Thanks

Jim






В списке pgsql-performance по дате отправления:

Предыдущее
От: Andrei Lepikhov
Дата:
Сообщение: Re: Inconsistent query performance based on relation hit frequency
Следующее
От: Achilleas Mantzios - cloud
Дата:
Сообщение: Re: Inconsistent query performance based on relation hit frequency