Re: Another regexp performance improvement: skip useless paren-captures

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Another regexp performance improvement: skip useless paren-captures
Дата
Msg-id 31908a04-f73b-8c5b-ed99-919fd4cd4b54@dunslane.net
обсуждение исходный текст
Ответ на Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Another regexp performance improvement: skip useless paren-captures  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 8/4/21 6:15 PM, Tom Lane wrote:
> Here's a little finger exercise that improves a case that's bothered me
> for awhile.  In a POSIX regexp, parentheses cause capturing by default;
> you have to write the very non-obvious "(?:...)" if you don't want the
> matching substring to be reported by the regexp engine.  


It's not obscure to perl programmers :-)


> That'd be fine
> if capturing were cheap, but with our engine it is not particularly
> cheap.  In many situations, the initial DFA check is sufficient to
> tell whether there is an overall match, but it does not tell where any
> subexpression match boundaries are.  To identify exactly which substring
> is deemed to match a parenthesized subexpression, we have to recursively
> break down the match, which takes at the very least a few more DFA
> invocations; and with an uncooperative regex, it can easily result in
> O(N^2) behavior where there was none at the DFA stage.
>
> Therefore, we really ought to expend some effort to not capture
> subexpressions if the sub-match data is not actually needed, which in
> many invocations we know that it isn't.  Spencer's original code has
> a REG_NOSUB option that looks like it ought to be good for this ... but
> on closer inspection it's basically useless, because it turns *all*
> parens into non-capturing ones.  That breaks back-references, so unless
> you know that the regexp contains no back-refs, you can't use it.


In perl you can use the 'n' modifier for this effect (since 5.22)


I would expect to know if a back-ref were present.


I'm a bit worried about how you'll keep track of back-ref numbering
since back-refs only count capturing groups, and you're silently turning
a capturing group into a non-capturing group.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Platon Pronko
Дата:
Сообщение: Re: very long record lines in expanded psql output
Следующее
От: Robert Haas
Дата:
Сообщение: Re: .ready and .done files considered harmful