Обсуждение: Regex help again (sorry, I am bad at these)

Поиск
Список
Период
Сортировка

Regex help again (sorry, I am bad at these)

От
Christopher Molnar
Дата:
Hello all!

Sorry to have to ask the experts here for some regex assistance again. I am admittadly awful with these and could use some help.

Have the following string (this is an example) that needs to be changed. Need to use a search and replace because the links across over 200K records are similar but not the same.

'<p>Complete the attached lab and submit via dropbox</p>\r<p><a href="https://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/HVACR1114_LAB_13A.pdf" title="Lab 13A">Lab 13A<\a>'


Need the final string to separate the "LAB_13A.pdf" from the rest of the URL by inserting a "&file=" in front of it. The final string should look like:


'<p>Complete the attached lab and submit via dropbox</p>\r<p><a href="https://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/&file=HVACR1114_LAB_13A.pdf" title="Lab 13A">Lab 13A<\a>'


I have tried something like:

 update pcilms_assign set intro=regexp_replace(intro, '/([^/]*)\" title=$', '&files=\1') where intro like '%https://owncloud.porterchester.edu%' and course=18 and id=55413;

and the result puts the &file= in the wrong place (at the end of the whole string).


Any suggestions?


Thanks!

-Chris


Re: Regex help again (sorry, I am bad at these)

От
Melvin Davidson
Дата:
Chris,

First, even though it may or may not apply in this instance, it is ALWAYS a good idea (and good manners) to provide the PostgreSQL version and your O/S when posting to this board.

I have also previously replied that the following should work:

UPDATE pcilms_assign
   SET intro = REPLACE (intro, 'HVACR1114_LAB_13A.pdf', '&file=HVACR1114_LAB_13A.pdf')
 WHERE intro like 'https://owncloud.porterchester.edu%'
        AND course=18 and id=55413;


NOTE: DO NOT use % in front of 'https:  It is not needed and will slow query performance.

Your mileage may vary because you have not stated your VERSION of PostgreSQL or your O/S.


On Mon, Dec 28, 2015 at 2:08 PM, Christopher Molnar <cmolnar65@gmail.com> wrote:
Hello all!

Sorry to have to ask the experts here for some regex assistance again. I am admittadly awful with these and could use some help.

Have the following string (this is an example) that needs to be changed. Need to use a search and replace because the links across over 200K records are similar but not the same.

'<p>Complete the attached lab and submit via dropbox</p>\r<p><a href="https://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/HVACR1114_LAB_13A.pdf" title="Lab 13A">Lab 13A<\a>'


Need the final string to separate the "LAB_13A.pdf" from the rest of the URL by inserting a "&file=" in front of it. The final string should look like:


'<p>Complete the attached lab and submit via dropbox</p>\r<p><a href="https://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/&file=HVACR1114_LAB_13A.pdf" title="Lab 13A">Lab 13A<\a>'


I have tried something like:

 update pcilms_assign set intro=regexp_replace(intro, '/([^/]*)\" title=$', '&files=\1') where intro like '%https://owncloud.porterchester.edu%' and course=18 and id=55413;

and the result puts the &file= in the wrong place (at the end of the whole string).


Any suggestions?


Thanks!

-Chris





--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Regex help again (sorry, I am bad at these)

От
Christoph Moench-Tegeder
Дата:
## Christopher Molnar (cmolnar65@gmail.com):

> I have tried something like:
>
>  update pcilms_assign set intro=regexp_replace(intro, '/([^/]*)\" title=$',
> '&files=\1') where intro like '%https://owncloud.porterchester.edu%' and
> course=18 and id=55413;

http://blog.codinghorror.com/parsing-html-the-cthulhu-way/
But you were close:
  regexp_replace(intro, '/([^/]*)" title=', '/&file=\1" title=')
looks like it's doing the right thing on your example.

Regards,
Christoph

--
Spare Space


Re: Regex help again (sorry, I am bad at these)

От
Christoph Moench-Tegeder
Дата:
## Melvin Davidson (melvin6925@gmail.com):

> UPDATE pcilms_assign
>    SET intro = REPLACE (intro, 'HVACR1114_LAB_13A.pdf',
> '&file=HVACR1114_LAB_13A.pdf')
>  WHERE intro like 'https://owncloud.porterchester.edu%'
>         AND course=18 and id=55413;

Unfortunately, that tries to do the right thing for that example
only but does not account for the 200k documents not named
HVACR1114_LAB_13A.pdf.

> NOTE: DO NOT use % in front of 'https: <https://owncloud.porterchester.edu>
> It is not needed and will slow query performance.

And that will plainly not work if the string does not start with the
URL - as in the example given.
It even won't work regardless of version and OS.

Regards,
Christoph

--
Spare Space


Re: Regex help again (sorry, I am bad at these)

От
Michael Nolan
Дата:


On Mon, Dec 28, 2015 at 2:08 PM, Christopher Molnar <cmolnar65@gmail.com> wrote:
Hello all!

Sorry to have to ask the experts here for some regex assistance again. I am admittadly awful with these and could use some help.

Any suggestions?


I have found over the years that it is far easier to write a short PHP or PERL program to do tasks like this.  Much easier to debug and the speed improvement by using SQL is not important for 200,000 records.
--
Mike Nolan 

Re: Regex help again (sorry, I am bad at these)

От
"David G. Johnston"
Дата:
On Tue, Dec 29, 2015 at 2:26 PM, Michael Nolan <htfoot@gmail.com> wrote:
On Mon, Dec 28, 2015 at 2:08 PM, Christopher Molnar <cmolnar65@gmail.com> wrote:
Hello all!

Sorry to have to ask the experts here for some regex assistance again. I am admittadly awful with these and could use some help.

Any suggestions?


I have found over the years that it is far easier to write a short PHP or PERL program to do tasks like this.  Much easier to debug and the speed improvement by using SQL is not important for 200,000 records.


​If all you end up doing is using regular expressions in Perl then I'm not sure how that solves the "inexperienced at regular expressions" problem...so what kind (if any) of non-regex based solution would​
 
​you implement to accomplish this goal.  I'll admit that, being familiar with regular expressions, I probably tend to resort to them by default now when other solutions - if I stopped to think of them - would be less matrix-y.

Obviously a simple find-replace is unlike to work well though some form of "split-and-rearrange" could work - but URLs seem to want the flexibility since defining split points in one seems challenging.

David J.