Обсуждение: Trouble with regexp_matches

Поиск
Список
Период
Сортировка

Trouble with regexp_matches

От
Edson Richter
Дата:

Dear list,

 

Version string    PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit

 

I’m running the query below, and it is limiting results as if “regexp_matches” being in where clause.

IMHO, it is wrong: in case there is no match, shall return null or empty array – not remove the result from the set!!!

 

Is this a collateral effect of using regexp_matches in columns?

If yes, shall not this information be BOLD RED FLASHING in documentation (or it is already, and some kind sould would point me where)?

 

-- First query (that is limiting results) ---------------------------------------------------------------------

select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

 

Total results = 59

 

--Second query (that is not limiting results, as I did expect)-------------------------------------------------------------------

select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

 

Total results = 3826

 

 

Why is that?

 

Regards,

 

Edson Richter

Re: Trouble with regexp_matches

От
Adrian Klaver
Дата:
On 11/05/2016 10:01 AM, Edson Richter wrote:
> Dear list,
>
>
>
> Version string    PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu,
> compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
>
>
>
> I’m running the query below, and it is limiting results as if
> “regexp_matches” being in where clause.
>
> IMHO, it is wrong: in case there is no match, shall return null or empty
> array – not remove the result from the set!!!
>
>
>
> Is this a collateral effect of using regexp_matches in columns?
>
> If yes, shall not this information be BOLD RED FLASHING in documentation
> (or it is already, and some kind sould would point me where)?

https://www.postgresql.org/docs/9.4/static/functions-matching.html

"It is possible to force regexp_matches() to always return one row by
using a sub-select; this is particularly useful in a SELECT target list
when you want all rows returned, even non-matching ones:

SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
"



>
>
>
> -- First query (that is limiting results)
> ---------------------------------------------------------------------
>
> select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
> observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')
>
> from batchloteocorrencia
>
> where codigoocorrencia = '091'
>
> and observacao is not null
>
> order by datahoraimportacao DESC
>
>
>
> Total results = 59
>
>
>
> --Second query (that is not limiting results, as I did
> expect)-------------------------------------------------------------------
>
> select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao
>
> from batchloteocorrencia
>
> where codigoocorrencia = '091'
>
> and observacao is not null
>
> order by datahoraimportacao DESC
>
>
>
> Total results = 3826
>
>
>
>
>
> Why is that?
>
>
>
> Regards,
>
>
>
> Edson Richter
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Trouble with regexp_matches

От
Tom Lane
Дата:
Edson Richter <edsonrichter@hotmail.com> writes:
> I’m running the query below, and it is limiting results as if “regexp_matches” being in where clause.
> IMHO, it is wrong: in case there is no match, shall return null or empty array – not remove the result from the
set!!!

Well, no, because regexp_matches() returns a set.  If there's no match,
there's zero rows in the set.

The standard workaround is to use a scalar sub-select, which has the
effect of converting a zero-row result into a NULL:

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
       observacao, (select regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d'))
from ...

As of v10 there will be a less confusing solution: use regexp_match()
instead.

            regards, tom lane


RES: Trouble with regexp_matches

От
Edson Richter
Дата:
> -----Mensagem original-----
> De: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Enviada em: sábado, 5 de novembro de 2016 15:21
> Para: Edson Richter <edsonrichter@hotmail.com>
> Cc: pgsql-general@postgresql.org
> Assunto: Re: [GENERAL] Trouble with regexp_matches
> 
> Edson Richter <edsonrichter@hotmail.com> writes:
> > I’m running the query below, and it is limiting results as if
> “regexp_matches” being in where clause.
> > IMHO, it is wrong: in case there is no match, shall return null or empty array
> – not remove the result from the set!!!
> 
> Well, no, because regexp_matches() returns a set.  If there's no match,
> there's zero rows in the set.

For me, it is a strange behavior - putting something in select clause will restrict results as if it lies in join or
whereclauses.
 


> 
> The standard workaround is to use a scalar sub-select, which has the effect
> of converting a zero-row result into a NULL:
> 
> select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
>        observacao, (select regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d'))
> from ...
> 
> As of v10 there will be a less confusing solution: use regexp_match() instead.
> 
>             regards, tom lane

Ok, for now, I've changed regexp_matches() to "... substring(observacao from '\d\d/\d\d/\d\d\d\d')" which does the job
gracefully.

I still believe that an alert shall be made in the docs page (String functions), because seems confusing and error
prone.


Thanks,

Edson Richter



RES: Trouble with regexp_matches

От
Edson Richter
Дата:
> -----Mensagem original-----
> De: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
> Enviada em: sábado, 5 de novembro de 2016 15:13
> Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-
> general@postgresql.org
> Assunto: Re: [GENERAL] Trouble with regexp_matches
> 
> On 11/05/2016 10:01 AM, Edson Richter wrote:
> > Dear list,
> >
> >
> >
> > Version string    PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu,
> > compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
> >
> >
> >
> > I’m running the query below, and it is limiting results as if
> > “regexp_matches” being in where clause.
> >
> > IMHO, it is wrong: in case there is no match, shall return null or
> > empty array – not remove the result from the set!!!
> >
> >
> >
> > Is this a collateral effect of using regexp_matches in columns?
> >
> > If yes, shall not this information be BOLD RED FLASHING in
> > documentation (or it is already, and some kind sould would point me
> where)?
> 
> https://www.postgresql.org/docs/9.4/static/functions-matching.html


I see - I always believed that this page was related to WHERE clause or using functions in the PostgreSQL way (which
is,in your turn, a alternative to "from" syntax) - not for the select clause.
 
But now that you mention it, and re-reading all the information, I can understand the implications.

Nevertheless, would be nice to put a huge warning at the "String functions" page about this behavior...

> 
> "It is possible to force regexp_matches() to always return one row by using a
> sub-select; this is particularly useful in a SELECT target list when you want all
> rows returned, even non-matching ones:
> 
> SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; "
> 

Thanks, this worked - as well removing the "regexp_matches" and using "SUBSTRING( text FROM pattern)".


I really appreciate your help.


Kind regards,

Edson Richter


> 
> 
> >
> >
> >
> > -- First query (that is limiting results)
> > ---------------------------------------------------------------------
> >
> > select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
> > observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')
> >
> > from batchloteocorrencia
> >
> > where codigoocorrencia = '091'
> >
> > and observacao is not null
> >
> > order by datahoraimportacao DESC
> >
> >
> >
> > Total results = 59
> >
> >
> >
> > --Second query (that is not limiting results, as I did
> > expect)---------------------------------------------------------------
> > ----
> >
> > select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
> > observacao
> >
> > from batchloteocorrencia
> >
> > where codigoocorrencia = '091'
> >
> > and observacao is not null
> >
> > order by datahoraimportacao DESC
> >
> >
> >
> > Total results = 3826
> >
> >
> >
> >
> >
> > Why is that?
> >
> >
> >
> > Regards,
> >
> >
> >
> > Edson Richter
> >
> 
> 
> --
> Adrian Klaver
> adrian.klaver@aklaver.com


Re: RES: Trouble with regexp_matches

От
Adrian Klaver
Дата:
On 11/05/2016 11:56 AM, Edson Richter wrote:
>> -----Mensagem original-----
>> De: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
>> Enviada em: sábado, 5 de novembro de 2016 15:13
>> Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-
>> general@postgresql.org
>> Assunto: Re: [GENERAL] Trouble with regexp_matches
>>
>> On 11/05/2016 10:01 AM, Edson Richter wrote:
>>> Dear list,
>>>
>>>
>>>
>>> Version string    PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu,
>>> compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
>>>
>>>
>>>
>>> I’m running the query below, and it is limiting results as if
>>> “regexp_matches” being in where clause.
>>>
>>> IMHO, it is wrong: in case there is no match, shall return null or
>>> empty array – not remove the result from the set!!!
>>>
>>>
>>>
>>> Is this a collateral effect of using regexp_matches in columns?
>>>
>>> If yes, shall not this information be BOLD RED FLASHING in
>>> documentation (or it is already, and some kind sould would point me
>> where)?
>>
>> https://www.postgresql.org/docs/9.4/static/functions-matching.html
>
>
> I see - I always believed that this page was related to WHERE clause or using functions in the PostgreSQL way (which
is,in your turn, a alternative to "from" syntax) - not for the select clause. 
> But now that you mention it, and re-reading all the information, I can understand the implications.
>
> Nevertheless, would be nice to put a huge warning at the "String functions" page about this behavior...

Well the above page is pointed to in the string functions section:

https://www.postgresql.org/docs/9.4/static/functions-string.html

"regexp_matches(string text, pattern text [, flags text])
setof text[]     Return all captured substrings resulting from matching a
POSIX regular expression against the string. See Section 9.7.3 for more
information."

Where 'Section 9.7.3' is a link to it. Not sure if that counts as a
warning though:)

>
>>
>> "It is possible to force regexp_matches() to always return one row by using a
>> sub-select; this is particularly useful in a SELECT target list when you want all
>> rows returned, even non-matching ones:
>>
>> SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; "
>>
>
> Thanks, this worked - as well removing the "regexp_matches" and using "SUBSTRING( text FROM pattern)".
>
>
> I really appreciate your help.
>
>
> Kind regards,
>
> Edson Richter
>
>
>>
>>
>>>
>>>
>>>
>>> -- First query (that is limiting results)
>>> ---------------------------------------------------------------------
>>>
>>> select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
>>> observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')
>>>
>>> from batchloteocorrencia
>>>
>>> where codigoocorrencia = '091'
>>>
>>> and observacao is not null
>>>
>>> order by datahoraimportacao DESC
>>>
>>>
>>>
>>> Total results = 59
>>>
>>>
>>>
>>> --Second query (that is not limiting results, as I did
>>> expect)---------------------------------------------------------------
>>> ----
>>>
>>> select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
>>> observacao
>>>
>>> from batchloteocorrencia
>>>
>>> where codigoocorrencia = '091'
>>>
>>> and observacao is not null
>>>
>>> order by datahoraimportacao DESC
>>>
>>>
>>>
>>> Total results = 3826
>>>
>>>
>>>
>>>
>>>
>>> Why is that?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Edson Richter
>>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com