Обсуждение: Unnecessary buffer usage with multicolumn index, row comparison, and equility constraint

Поиск
Список
Период
Сортировка
Hi everyone, first time here. Please kindly let me know if this is not the right place to ask.

I notice a simple query can read a lot of buffer blocks in a meaningless way, when
1. there is an index scan on a multicolumn index
2. there is row constructor comparison in the Index Cond
3. there is also an equality constraint on the leftmost column of the multicolumn index


## How to reproduce

I initially noticed it on AWS Aurora RDS, but it can be reproduced in docker container as well.
```bash
docker run --name test-postgres -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres:16.3
```

Create a table with a multicolumn index. Populate 12 million rows with random integers.
```sql
CREATE TABLE t(a int, b int);
CREATE INDEX my_idx ON t USING BTREE (a, b);

INSERT INTO t(a, b)
SELECT
    (random() * 123456)::int AS a,
    (random() * 123456)::int AS b
FROM
    generate_series(1, 12345678);

ANALYZE t;
```

Simple query that uses the multicolumn index.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 0 order by a, b;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..8.46 rows=1 width=8) (actual time=284.312..284.314 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 0))
   Heap Fetches: 0
   Buffers: shared hit=3777 read=37304 written=11713
 Planning:
   Buffers: shared hit=22 read=4
 Planning Time: 0.270 ms
 Execution Time: 284.341 ms
(8 rows)
```

## Expected output

The number of buffer blocks used is high. I expect it to be no more than when there’s only one constraint.

```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) order by a, b;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..23.67 rows=642 width=8) (actual time=0.030..0.158 rows=542 loops=1)
   Index Cond: (ROW(a, b) > ROW(123450, 123450))
   Heap Fetches: 0
   Buffers: shared hit=254 read=3
 Planning:
   Buffers: shared read=4
 Planning Time: 0.232 ms
 Execution Time: 0.206 ms
(8 rows)

postgres=# explain (analyze, buffers) select * from t where a = 0 order by a, b;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..6.20 rows=101 width=8) (actual time=0.099..0.113 rows=57 loops=1)
   Index Cond: (a = 0)
   Heap Fetches: 0
   Buffers: shared hit=27 read=2
 Planning Time: 0.081 ms
 Execution Time: 0.131 ms
(6 rows)
```

## Postgres version

16.3

## Platform information

I can reproduce it on the latest postgres docker image, which is based on Debian Linux. Originally found the issue on AWS Aurora.



The following are my own observation and thoughts. Please disregard if it’s distraction.

For a general form of
```sql
select * from t where (a, b) > (x, y) and a = z order by a, b;
```

1. The number of buffer blocks is proportional to the gap between x and z. Strictly, it’s max(0, min(x, max(a)) – max(z, min(a))).

```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = -30000 order by a, b;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=243.173..243.175 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = '-30000'::integer))
   Heap Fetches: 0
   Buffers: shared hit=1 read=41080
 Planning:
   Buffers: shared hit=2 read=2
 Planning Time: 0.174 ms
 Execution Time: 243.199 ms
(8 rows)

postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 0 order by a, b;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=230.425..230.426 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 0))
   Heap Fetches: 0
   Buffers: shared hit=1 read=41080
 Planning:
   Buffers: shared read=4
 Planning Time: 0.296 ms
 Execution Time: 230.460 ms
(8 rows)

postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 30000 order by a, b;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=171.787..171.788 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 30000))
   Heap Fetches: 0
   Buffers: shared hit=1 read=31126
 Planning:
   Buffers: shared read=4
 Planning Time: 0.191 ms
 Execution Time: 171.812 ms
(8 rows)

postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 60000 order by a, b;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=137.516..137.518 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 60000))
   Heap Fetches: 0
   Buffers: shared hit=1 read=21139
 Planning:
   Buffers: shared read=4
 Planning Time: 0.212 ms
 Execution Time: 137.543 ms
(8 rows)

postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 90000 order by a, b;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=57.868..57.870 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 90000))
   Heap Fetches: 0
   Buffers: shared hit=11187 read=1
 Planning:
   Buffers: shared hit=1 read=3
 Planning Time: 0.240 ms
 Execution Time: 57.896 ms
(8 rows)

postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 120000 order by a, b;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=6.018..6.019 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 120000))
   Heap Fetches: 0
   Buffers: shared hit=1173 read=1
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.122 ms
 Execution Time: 6.052 ms
(8 rows)
```

2. It’s not an issue when `a=x` becomes `a<x` or `a>x`.

```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a < 100 order by a, b;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a < 100))
   Heap Fetches: 0
   Buffers: shared hit=3
 Planning:
   Buffers: shared hit=8
 Planning Time: 0.119 ms
 Execution Time: 0.020 ms
(8 rows)

postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a > 100 order by a, b;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..25.25 rows=641 width=8) (actual time=0.040..0.339 rows=542 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a > 100))
   Heap Fetches: 0
   Buffers: shared hit=257
 Planning:
   Buffers: shared hit=8
 Planning Time: 0.233 ms
 Execution Time: 0.443 ms
(8 rows)
```

3. It’s not an issue when `a=x` becomes `b=x`.

4. The example query is trivial and for demo purpose only. Obviously there’s no need to supply `a = 0` when there’s `(a, b) > (123450, 123450)`. However, in practice it can be a problem when the table is joined to other tables, resulting in a nested loop for a list of `a` values that we have no control of, while `(a, b) > (x, y)` is used for paging.

5. My current workaround is add `AND a >= x` to `(a, b) > (x, y)`. However, this makes the planner underestimate the number of rows due to the multiplied selectivities.

Best regards,
Yan
On Fri, May 10, 2024 at 11:28 PM WU Yan <4wuyan@gmail.com> wrote:
Hi everyone, first time here. Please kindly let me know if this is not the right place to ask.

I notice a simple query can read a lot of buffer blocks in a meaningless way, when
1. there is an index scan on a multicolumn index
2. there is row constructor comparison in the Index Cond
3. there is also an equality constraint on the leftmost column of the multicolumn index


## How to reproduce

I initially noticed it on AWS Aurora RDS, but it can be reproduced in docker container as well.
```bash
docker run --name test-postgres -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres:16.3
```

Create a table with a multicolumn index. Populate 12 million rows with random integers.
```sql
CREATE TABLE t(a int, b int);
CREATE INDEX my_idx ON t USING BTREE (a, b);

INSERT INTO t(a, b)
SELECT
    (random() * 123456)::int AS a,
    (random() * 123456)::int AS b
FROM
    generate_series(1, 12345678);

ANALYZE t;
```

Simple query that uses the multicolumn index.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 0 order by a, b;

Out of curiosity, why "where row(a, b) > row(123450, 123450)" instead of "where a > 123450 and b > 123450"?

Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Fri, May 10, 2024 at 11:28 PM WU Yan <4wuyan@gmail.com> wrote:
>> Simple query that uses the multicolumn index.
>> postgres=# explain (analyze, buffers) select * from t where row(a, b) >
>> row(123450, 123450) and a = 0 order by a, b;

> Out of curiosity, why "where row(a, b) > row(123450, 123450)" instead of "where
> a > 123450 and b > 123450"?

That row() condition actually means "a > 123450 OR
(a = 123450 AND b > 123450)", which is not the same.

(It'd be a little clearer with two different values in
the row constant, perhaps.)

It does seem like there's an optimization failure here.
I don't expect btree to analyze row comparisons exactly,
but it's sad that it seems to be stupider than for the
simplified case

explain (analyze, buffers) select * from t
where a >= 123450 and a = 0
order by a, b;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Index Only Scan using my_idx on t  (cost=0.43..4.45 rows=1 width=8) (actual time=0.001..0.002 rows=0 loops=1)
   Index Cond: ((a >= 123450) AND (a = 0))
   Heap Fetches: 0
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.081 ms
 Execution Time: 0.013 ms
(7 rows)

For that, it's able to see that the index conditions are
contradictory, so it fetches no index pages whatever.

            regards, tom lane