Using regexp from table has unpredictable poor performance

Поиск

Список

Период

Сортировка

От	Jack Christensen
Тема	Using regexp from table has unpredictable poor performance
Дата	25 августа 2021 г. 19:47:43
Msg-id	CAMovtNqWPT38rLihCo5JdgFasH0hVOmOBr4=FGo2ZiTMQYrbgA@mail.gmail.com обсуждение исходный текст
Ответы	Re: Using regexp from table has unpredictable poor performance (Justin Pryzby <pryzby@telsasoft.com>)
Список	pgsql-performance

Дерево обсуждения

I have items that need to be categorized by user defined matching rules. Trusted users can create rules that include regular expressions. I've reduced the problem to this example.

Table "public.items"
Column │ Type │ Collation │ Nullable │ Default
────────┼─────────┼───────────┼──────────┼─────────
id │ integer │ │ not null │
name │ text │ │ not null │
Indexes:
"items_pkey" PRIMARY KEY, btree (id)

Table "public.matching_rules"
Column │ Type │ Collation │ Nullable │ Default
──────────────┼─────────┼───────────┼──────────┼─────────
id │ integer │ │ not null │
name_matches │ text │ │ not null │
Indexes:
"matching_rules_pkey" PRIMARY KEY, btree (id)

I use the following query to find matches:

select r.id, i.id
from items i
join matching_rules r on i.name ~ r.name_matches;

When there are few rules the query runs quickly. But as the number of rules increases the runtime often increases at a greater than linear rate.

For example if I run two queries, one the tests rule IDs 0 - 30 and another that tests 30 - 60 the total runtime is less than 100ms. But if I instead test rule IDs 0 - 60 in a single query the runtime balloons to over 1300ms.

explain analyze
select r.id, i.id
from items i
join matching_rules r on i.name ~ r.name_matches
where r.id >= 0 and r.id < 30
;
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=0.00..260.82 rows=80 width=8) (actual time=0.820..28.334 rows=172 loops=1)
Join Filter: (i.name ~ r.name_matches)
Rows Removed by Join Filter: 16828
-> Seq Scan on items i (cost=0.00..18.00 rows=1000 width=27) (actual time=0.006..0.176 rows=1000 loops=1)
-> Materialize (cost=0.00..2.86 rows=16 width=26) (actual time=0.000..0.001 rows=17 loops=1000)
-> Seq Scan on matching_rules r (cost=0.00..2.78 rows=16 width=26) (actual time=0.004..0.012 rows=17 loops=1)
Filter: ((id >= 0) AND (id < 30))
Rows Removed by Filter: 35
Planning Time: 0.086 ms
Execution Time: 28.364 ms

explain analyze
select r.id, i.id
from items i
join matching_rules r on i.name ~ r.name_matches
where r.id >= 30 and r.id < 60
;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=0.00..470.86 rows=150 width=8) (actual time=1.418..65.508 rows=530 loops=1)
Join Filter: (i.name ~ r.name_matches)
Rows Removed by Join Filter: 28470
-> Seq Scan on items i (cost=0.00..18.00 rows=1000 width=27) (actual time=0.007..0.193 rows=1000 loops=1)
-> Materialize (cost=0.00..2.93 rows=30 width=26) (actual time=0.000..0.002 rows=29 loops=1000)
-> Seq Scan on matching_rules r (cost=0.00..2.78 rows=30 width=26) (actual time=0.005..0.020 rows=29 loops=1)
Filter: ((id >= 30) AND (id < 60))
Rows Removed by Filter: 23
Planning Time: 0.076 ms
Execution Time: 65.573 ms

explain analyze
select r.id, i.id
from items i
join matching_rules r on i.name ~ r.name_matches
where r.id >= 0 and r.id < 60
;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=0.00..710.89 rows=230 width=8) (actual time=3.731..1344.834 rows=702 loops=1)
Join Filter: (i.name ~ r.name_matches)
Rows Removed by Join Filter: 45298
-> Seq Scan on items i (cost=0.00..18.00 rows=1000 width=27) (actual time=0.006..0.442 rows=1000 loops=1)
-> Materialize (cost=0.00..3.01 rows=46 width=26) (actual time=0.000..0.004 rows=46 loops=1000)
-> Seq Scan on matching_rules r (cost=0.00..2.78 rows=46 width=26) (actual time=0.004..0.019 rows=46 loops=1)
Filter: ((id >= 0) AND (id < 60))
Rows Removed by Filter: 6
Planning Time: 0.084 ms
Execution Time: 1344.967 ms

It's also not predictable when additional regexp rows will trigger the poor performance. There's not a specific number of rows or kind of regexp that I can discern that triggers the issue. The regexps themselves are pretty trivial too. Only normal text, start and end of string anchors, and alternation.

I've vacuumed, analyzed, and I am on PostgreSQL 13.4 on x86_64-apple-darwin20.4.0, compiled by Apple clang version 12.0.5 (clang-1205.0.22.9), 64-bit.

Any ideas what's causing this?

Thanks.

Jack

В списке pgsql-performance по дате отправления:

Предыдущее

От: "David G. Johnston"
Дата: 24 августа 2021 г., 17:55:46
Сообщение: Re: pg_restore schema dump to schema with different name

Следующее

От: Justin Pryzby
Дата: 26 августа 2021 г., 00:05:00
Сообщение: Re: Using regexp from table has unpredictable poor performance

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Using regexp from table has unpredictable poor performance

Предыдущее

Следующее