Обсуждение: regexp_replace

Поиск
Список
Период
Сортировка

regexp_replace

От
Andy Colson
Дата:
Hi all.

This is not doing as I'd expected:

select regexp_replace('71.09.6.01.3', '(\d)[.-](\d)', '\1\2', 'g');

  regexp_replace
----------------
  71096.013
(1 row)

It acts the same with dashes:
select regexp_replace('71-09-6-01-3', '(\d)[.-](\d)', '\1\2', 'g');

  regexp_replace
----------------
  71096-013
(1 row)

I cannot use translate because there is other text in the field.  I'm
trying to strip masking characters from a parcel number in a larger text
field (for example:  "the parcel 12-34-56 has caught on fire")

I seem to be missing something, any hints?

I'm on PG 9.3.9 on Slackware64.

Thanks for your time,

-Andy


Re: regexp_replace

От
Tom Lane
Дата:
Andy Colson <andy@squeakycode.net> writes:
> This is not doing as I'd expected:

> select regexp_replace('71.09.6.01.3', '(\d)[.-](\d)', '\1\2', 'g');

>   regexp_replace
> ----------------
>   71096.013
> (1 row)

I think regexp_replace considers only non-overlapping substrings,
eg, once it's replaced 1.0 with 10, it then picks up searching at
the 9 rather than starting over.  The dot after 6 doesn't get
removed because the 6 can't belong to two replaceable substrings, and
it already got consumed in the process of removing the dot before 6.

I might be wrong, but I think two passes of regexp_replace would
do what you want in this example.

            regards, tom lane


Re: regexp_replace

От
John McKown
Дата:
How about:

select regexp_replace('71.09.6.01.3', '(\d)[.-]', '\1', 'g');

?

In your example, the (\d)[.-](\d) says find a digit followed by a period or dash followed by another digit. The first time through 1.0 is matched and replaced with 10 (710) with the "current location" pointing before the 9. Go again and 9.6 is replaced by 96 for (71096) with the "current location" pointing to the period! So ".0" doesn't match. (71096.0) next match is 1.3 and result is 13 ( 71096.013). If you don't want to eliminate the period or dash unless it is _between_ two digits, try:

select regexp_replace('71.09.6.01.3', '(\d)[.-](?=\d)', '\1', 'g');

(?=\d) is a "look ahead") match which says that the period or dash must be followed by a digit, but the expression _does not_ "consume" the digit matched.


On Thu, Jan 14, 2016 at 1:43 PM, Andy Colson <andy@squeakycode.net> wrote:
Hi all.

This is not doing as I'd expected:

select regexp_replace('71.09.6.01.3', '(\d)[.-](\d)', '\1\2', 'g');

 regexp_replace
----------------
 71096.013
(1 row)

It acts the same with dashes:
select regexp_replace('71-09-6-01-3', '(\d)[.-](\d)', '\1\2', 'g');

 regexp_replace
----------------
 71096-013
(1 row)

I cannot use translate because there is other text in the field.  I'm trying to strip masking characters from a parcel number in a larger text field (for example:  "the parcel 12-34-56 has caught on fire")

I seem to be missing something, any hints?

I'm on PG 9.3.9 on Slackware64.

Thanks for your time,

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Werner Heisenberg is driving down the autobahn. A police officer pulls
him over. The officer says, "Excuse me, sir, do you know how fast you
were going?"
"No," replies Dr. Heisenberg, "but I know where I am."

Computer Science is the only discipline in which we view adding a new wing to a building as being maintenance -- Jim Horning

Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.

He's about as useful as a wax frying pan.

Maranatha! <><
John McKown

Re: regexp_replace

От
Andy Colson
Дата:
On 1/14/2016 1:59 PM, Tom Lane wrote:
> Andy Colson <andy@squeakycode.net> writes:
>> This is not doing as I'd expected:
>
>> select regexp_replace('71.09.6.01.3', '(\d)[.-](\d)', '\1\2', 'g');
>
>>    regexp_replace
>> ----------------
>>    71096.013
>> (1 row)
>
> I think regexp_replace considers only non-overlapping substrings,
> eg, once it's replaced 1.0 with 10, it then picks up searching at
> the 9 rather than starting over.  The dot after 6 doesn't get
> removed because the 6 can't belong to two replaceable substrings, and
> it already got consumed in the process of removing the dot before 6.
>
> I might be wrong, but I think two passes of regexp_replace would
> do what you want in this example.
>
>             regards, tom lane
>

Ah, that would make sense, and seems to explain:

select regexp_replace('7-9-6-1-3', '(\d)[.-](\d)', '\1\2', 'g');

  regexp_replace
----------------
  79-61-3
(1 row)


select regexp_replace('71-09-56-01-53', '(\d)[.-](\d)', '\1\2', 'g');

  regexp_replace
----------------
  7109560153
(1 row)


I can work two passes in.  Thanks Tom!

-Andy


Re: regexp_replace

От
"David G. Johnston"
Дата:
On Thu, Jan 14, 2016 at 12:43 PM, Andy Colson <andy@squeakycode.net> wrote:
Hi all.

This is not doing as I'd expected:

select regexp_replace('71.09.6.01.3', '(\d)[.-](\d)', '\1\2', 'g');

 regexp_replace
----------------
 71096.013
(1 row)


​Solution: select regexp_replace('71.09.6.01.3', '(\d)[.-](?=\d)', '\1\2', 'g');

Reason: in the original the trailing "(\d)" eats ​the digit following the symbol and then that digit is no longer available for matching the preceding digit in the expression.  IOW the same character cannot be used to match both the first \d and the second \d so once the first \d captures the 6 there is no \d to match before trailing period.

By using the construct (?:\d) you are zero-width (non-capturing) asserting the the next character is a digit but you are not consuming it and so the continuation of the global matching still has that character to match the first \d.

David J.


Re: regexp_replace

От
Andy Colson
Дата:
On 1/14/2016 2:02 PM, John McKown wrote:
> How about:
>
> select regexp_replace('71.09.6.01.3', '(\d)[.-]', '\1', 'g');
>
> match is 1.3 and result is 13 ( 71096.013). If you don't want to
> eliminate the period or dash unless it is _between_ two digits, try:
>
> select regexp_replace('71.09.6.01.3', '(\d)[.-](?=\d)', '\1', 'g');
>
> (?=\d) is a "look ahead") match which says that the period or dash must
> be followed by a digit, but the expression _does not_ "consume" the
> digit matched.
>
>
>
> Maranatha! <><
> John McKown

Yes, excellent, both seem to work.  I'll run a bunch of data through
them both and see what happens.

Thanks much for the help!

-Andy



Re: regexp_replace

От
Andy Colson
Дата:
On 1/14/2016 2:06 PM, David G. Johnston wrote:
> select regexp_replace('71.09.6.01.3', '(\d)[.-](?=\d)', '\1\2', 'g');

Thanks David!

-Andy


Re: regexp_replace

От
"David G. Johnston"
Дата:
On Thu, Jan 14, 2016 at 1:27 PM, Andy Colson <andy@squeakycode.net> wrote:
On 1/14/2016 2:06 PM, David G. Johnston wrote:
select regexp_replace('71.09.6.01.3', '(\d)[.-](?=\d)', '\1\2', 'g');


​John already picked up on the fact that the "\2" in the replacement is pointless (neither helping nor hurting) since nothing was captured at that position.

David J.