Обсуждение: GIST combo index condition chosen for users queries is different from table owner's query

Поиск
Список
Период
Сортировка

GIST combo index condition chosen for users queries is different from table owner's query

От
Dennis White
Дата:
My project is using PostgreSQL 12.9 and has a table that gets millions of positions of ships every day. It is partitioned by tier (the importance of position) and sub-partitioned by time.
We also use RLS to restrict access to rows in the table.

Our problem is the query planner will not let user queries use the spatial condition like it does for the table owner and as a result the queries don't ever finish.
I created a minimal test case of the problem and have attached all of the statements needed to recreate the problem

The key statements are as follows:

CREATE TABLE IF NOT EXISTS test.qtest_position (
    position_id         bigint NOT NULL,
    tier                smallint,
    toi                 timestamptz(0) NOT NULL,
    track_id            bigint,
    security_tag        varchar(33),
    posit               public.geometry(Point)
) PARTITION BY LIST (tier);

CREATE TABLE IF NOT EXISTS test.qtest_posit_t1 PARTITION OF test.qtest_position
    FOR VALUES IN (1) PARTITION BY RANGE (toi);

CREATE TABLE test.qtest_posit_t1_template ( LIKE test.qtest_posit_t1 INCLUDING ALL );
ALTER TABLE test.qtest_posit_t1_template ADD PRIMARY KEY (position_id);
CREATE INDEX ON test.qtest_posit_t1_template (track_id, toi);
CREATE INDEX ON test.qtest_posit_t1_template USING GIST (posit, toi) include (security_tag);

SELECT partman.create_parent('test.qtest_posit_t1', 'toi', 'native', '1 days',
  p_premake := 20, p_start_partition := '2022-09-01', p_template_table := 'test.qtest_posit_t1_template');

Here's a simple example of a user query and the resulting plan:

set role test_user;
SET
explain analyze SELECT  
        position_961.SECURITY_TAG,
        position_961.TRACK_ID,
        position_961.POSIT,
        position_961.POSITION_ID,
        position_961.TOI  
    FROM
        test.qtest_position position_961  
    WHERE
        (ST_Intersects( position_961.POSIT, ST_SETSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[ 67.729806, 15.734972],[49.077444,-9.989],[39.995417,-3.982083],[51.616528,17.977861],[67.729806,15.734972]]]}')  , 4326)  )  
            )  AND
 position_961.TOI BETWEEN '2022-09-02T20:28:42.753Z'::TIMESTAMPTZ AND '2022-09-03T20:28:42.753Z'::TIMESTAMPTZ
  and test.user_has_access(security_tag) = '1';
                                                                                                                                                                                                       
                                        QUERY PLAN                                                                                                                                                      
                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
 Append  (cost=0.42..495598323.78 rows=2 width=60) (actual time=7246.524..31614.944 rows=699 loops=1)
   ->  Index Scan using qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx on qtest_posit_t1_p2022_09_02 position_961  (cost=0.42..72721949.89 rows=1 width=60) (actual time=7246.523..9560.695 rows=
121 loops=1)
         Index Cond: ((toi >= '2022-09-02 20:28:42.753+00'::timestamp with time zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time zone))
         Filter: ((test.user_has_access(security_tag) = '1'::text) AND (test.user_has_access(security_tag) = '1'::text) AND st_intersects(posit, '0103000020E610000001000000050000002C9B3924B5EE504091F3
FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))
         Rows Removed by Filter: 2851335
   ->  Seq Scan on qtest_posit_t1_p2022_09_03 position_961_1  (cost=0.00..422876373.88 rows=1 width=60) (actual time=84.624..22054.050 rows=578 loops=1)
         Filter: ((toi >= '2022-09-02 20:28:42.753+00'::timestamp with time zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time zone) AND (test.user_has_access(security_tag) = '1'::tex
t) AND (test.user_has_access(security_tag) = '1'::text) AND st_intersects(posit, '0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9
F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))
         Rows Removed by Filter: 16561241
 Planning Time: 0.369 ms
 Execution Time: 31615.459 ms
(10 rows)

-- run the same query as table owner
set role test_owner;
SET
explain analyze SELECT  
        position_961.SECURITY_TAG,
        position_961.TRACK_ID,
        position_961.POSIT,
        position_961.POSITION_ID,
        position_961.TOI  
    FROM
        test.qtest_position position_961  
    WHERE
        (ST_Intersects( position_961.POSIT, ST_SETSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[ 67.729806, 15.734972],[49.077444,-9.989],[39.995417,-3.982083],[51.616528,17.977861],[67.729806,15.734972]]]}')  , 4326)  )  
            )  AND
 position_961.TOI BETWEEN '2022-09-02T20:28:42.753Z'::TIMESTAMPTZ AND '2022-09-03T20:28:42.753Z'::TIMESTAMPTZ
  and test.user_has_access(security_tag) = '1';
                                                                                                                                                                                       QUERY PLAN      
                                                                                                                                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.42..88190.82 rows=16 width=60) (actual time=0.167..6.259 rows=699 loops=1)
   ->  Index Scan using qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx on qtest_posit_t1_p2022_09_02 position_961  (cost=0.42..10309.24 rows=2 width=60) (actual time=0.167..1.320 rows=121 loops
=1)
         Index Cond: ((posit && '0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3
924B5EE504091F3FE3F4E782F40'::geometry) AND (toi >= '2022-09-02 20:28:42.753+00'::timestamp with time zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time zone))
         Filter: ((test.user_has_access(security_tag) = '1'::text) AND st_intersects(posit, '0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D36
9FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))
         Rows Removed by Filter: 201
   ->  Index Scan using qtest_posit_t1_p2022_09_03_posit_toi_security_tag_idx on qtest_posit_t1_p2022_09_03 position_961_1  (cost=0.42..77881.49 rows=14 width=60) (actual time=0.484..4.876 rows=578 lo
ops=1)
         Index Cond: ((posit && '0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3
924B5EE504091F3FE3F4E782F40'::geometry) AND (toi >= '2022-09-02 20:28:42.753+00'::timestamp with time zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time zone))
         Filter: ((test.user_has_access(security_tag) = '1'::text) AND st_intersects(posit, '0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D36
9FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))
         Rows Removed by Filter: 1151
 Planning Time: 0.525 ms
 Execution Time: 6.324 ms
(11 rows)
 
Is there something I can do to allow users queries to use the index with a condition like that used for the table owner's query?
 
Вложения

Re: GIST combo index condition chosen for users queries is different from table owner's query

От
Tom Lane
Дата:
Dennis White <dwhite@seawardmoon.com> writes:
> Is there something I can do to allow users queries to use the index with a
> condition like that used for the table owner's query?

It looks like the problem in your badly-optimized query is that
there is not an indexable condition being extracted from the
ST_INTERSECTS() call.  In the well-optimized one, we've got

   ->  Index Scan using qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx...
         Index Cond: ((posit &&

'0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry)
AND... 
         Filter: ((test.user_has_access(security_tag) = '1'::text) AND
st_intersects(posit,

'0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))

I presume what's happening there is that st_intersects() has got a support
function that knows that "st_intersects(foo, bar)" implies "foo && bar"
and the latter can be used with an index on foo.

However, to do that in the presence of RLS we have to know that the
extracted condition would be leakproof.  I'm not sure that the geometry &&
operator is leakproof in the first place; and even if it is, we might not
consider this option unless st_intersects() is also marked leakproof,
which most likely it isn't.  You'd have to ask the PostGIS crew whether
either of those things would be safe to consider leakproof ... but I'm
betting they'll say that doing so would create an unreasonably large
bug surface.

By and large, the combination of RLS with complicated WHERE conditions
is just deadly for performance, because most of the time we won't be
able to use the WHERE conditions until after applying the RLS filter.
Do you really need to use RLS in this application?  If you're stuck
doing so, you could maybe ameliorate things by implementing the RLS
check functions in the fastest way you can, like writing C code
for them.

            regards, tom lane



Re: GIST combo index condition chosen for users queries is different from table owner's query

От
Dennis White
Дата:
Thanks.
As soon as I read your reply I recalled the leakproof issue from a discussion with a former colleague years ago.
At the time, I was new to Postgresql and I realize now I should have remembered that.

Disabling the RLS indeed resulted in the superior plan for the test_user.
The harder part will be baking the function call used for RLS into all query predicates rather than relying on RLS to do it for us.
I also recall that we got around the leakproof problem in postgres 10.2 by somehow just declaring st_intersects() to be leakproof but that would probably not work in an AWS RDS deployment. I will research the leakproof issue more and see what options we may have in dealing with this problem. Perhaps sometime in the future RLS won't break such queries but I understand that is probably not an easy task.
Thanks for replying and helping me on my way.

Dennis

On Mon, Sep 19, 2022 at 7:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dennis White <dwhite@seawardmoon.com> writes:
> Is there something I can do to allow users queries to use the index with a
> condition like that used for the table owner's query?

It looks like the problem in your badly-optimized query is that
there is not an indexable condition being extracted from the
ST_INTERSECTS() call.  In the well-optimized one, we've got

   ->  Index Scan using qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx...
         Index Cond: ((posit &&
'0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry) AND ...
         Filter: ((test.user_has_access(security_tag) = '1'::text) AND
st_intersects(posit,
'0103000020E610000001000000050000002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))

I presume what's happening there is that st_intersects() has got a support
function that knows that "st_intersects(foo, bar)" implies "foo && bar"
and the latter can be used with an index on foo.

However, to do that in the presence of RLS we have to know that the
extracted condition would be leakproof.  I'm not sure that the geometry &&
operator is leakproof in the first place; and even if it is, we might not
consider this option unless st_intersects() is also marked leakproof,
which most likely it isn't.  You'd have to ask the PostGIS crew whether
either of those things would be safe to consider leakproof ... but I'm
betting they'll say that doing so would create an unreasonably large
bug surface.

By and large, the combination of RLS with complicated WHERE conditions
is just deadly for performance, because most of the time we won't be
able to use the WHERE conditions until after applying the RLS filter.
Do you really need to use RLS in this application?  If you're stuck
doing so, you could maybe ameliorate things by implementing the RLS
check functions in the fastest way you can, like writing C code
for them.

                        regards, tom lane