Обсуждение: matching column of regexps

Поиск
Список
Период
Сортировка

matching column of regexps

От
James Cloos
Дата:
I have a table with a column of regexps, and need to query whether a
provided string matches any of them.

Eg,
 create table retest ( id serial primary key, active bool not null default true,                       re text unique
notnull, description text );
 

with queries of the form:
 select count(re) > 0 from retest where active is true and ? ~ re;

There also will be occasional not-as-speed-sensitive queries which need
to return the matching descriptions:
 select re, description from retest where active is true and ? ~ re;

(The serial column is there only to make it easier to change or delete
some rows when managing the table in psql.)

I was happy to find that the ~ operator works in both directions, but
querying whether count(re) > 0 was the best I could come up with to
get a bool result.

Is there a better way to answer the question, "Do ANY rows match?"
without having to return the list of matching rows?  I didn't find
anything googling.

Is there a way to index such a table/query?

One of my use cases, on contsrained systems, is likely to have fewer
than fifty rows, few of which will have active=f.  I presume that an
index is unlikely to help any given the small table size.

But another use case may end up with thousands to millions of rows.

I've considerred a single-row view defined via a function which
collapeses a list of regexps into a single regexp.  But I'm concerned
that a single massive regexp may may be too much for pg's re engine?

My tests suggest that the planner is not able to stop iterating though
the rows once one matches the where.  Do I need to write an aggregate to
accomplish that shortcut?

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 0x997A9F17ED7DAEA6



Re: matching column of regexps

От
Vik Fearing
Дата:
On 05/18/2014 07:44 PM, James Cloos wrote:
> Is there a better way to answer the question, "Do ANY rows match?"
> without having to return the list of matching rows?  I didn't find
> anything googling.

select exists (select 1 from retest where active is true and ? ~ re);


> Is there a way to index such a table/query?

There are several ways to index such a query.  If there are very many
rows but with only a few being active, then a partial index will do wonders.

http://www.postgresql.org/docs/current/static/indexes-partial.html

Otherwise, it is possible to use an index for regular expressions using
the pg_trgm extension.

http://www.postgresql.org/docs/current/static/pgtrgm.html

-- 
Vik




Re: matching column of regexps

От
James Cloos
Дата:
>>>>> "VF" == Vik Fearing <vik.fearing@dalibo.com> writes:

JC>> Is there a better way to answer the question, "Do ANY rows match?"

VF> select exists (select 1 from retest where active is true and ? ~ re);

Ah.  Yes.  I'd forgotten about select exists.  I cannot recall whether I
ever used it in anger, or just played around after reading about it.

It should stick this time.

>> Is there a way to index such a table/query?

VF> There are several ways to index such a query.  If there are very many
VF> rows but with only a few being active, then a partial index will do wonders.

Its more likely only a few will be inactive.

VF> Otherwise, it is possible to use an index for regular expressions using
VF> the pg_trgm extension.

Perfect.  I see trgm index support for ~, et alia is new in 9.3.

Exactly the kicks in the skull I needed.

Thanks,

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 0x997A9F17ED7DAEA6