Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?

Поиск
Список
Период
Сортировка
От Francisco Olarte
Тема Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?
Дата
Msg-id CA+bJJbzNHEqufUh=SUGJ_zSXU5TEAgdTgHqpzv_UZ9SVgg6KUg@mail.gmail.com
обсуждение исходный текст
Ответ на Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-general
Hi David:

On Sun, Oct 18, 2015 at 7:49 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Other implementation of regular expressions handle "newline" mechanics
> related to "^" and "$" semantically instead of literally.  By that I mean
> that both "\r\n" and "\n" are considered "newlines" instead of just "\n".

Which ones ? AFAIK this kind of thing is usually done by C ( and
related ) runtimes when reading text files.

At least in my machine perl does not do it:

censored:~$ perl -e 'print( ("A\r\n" =~ /A$/) ? "matched\n" : "NO MATCH\n");'
NO MATCH
censored:~$ perl -e 'print( ("A\r\n" =~ /A.$/) ? "matched\n" : "NO MATCH\n");'
matched
censored:~$ perl -e 'print( ("A\r\n" =~ /A\s$/) ? "matched\n" : "NO MATCH\n");'
matched

Normally when reading lines in CP/M and related ( MSDOS, Windows ) the
CRT does collapse them ( and sometimes just zaps \r, or collapse any
run, or consider [\r*]\n[\r*] or.... ). But I normally do not see that
behaviour in regexes.

> If changing behavior is not desirable I would be content with another flag
> that would toggle such behavior.
> In code - both of these subqueries should match whereas presently only the
> first one does.
> SELECT regexp_matches(E'123\n',   E'123$', 'w');
> SELECT regexp_matches(E'123\r\n', E'123$', 'w');
> I don't know if this is server O/S dependent...but I would not expect it to
> be so.

Neither do I ( expect it to be os dep. ) , but I find the current
behaviour correct. I mean, newline stuff is OS dependent, and you
should convert when ingesting data, when matching them it should
already have been converted to whatever the language uses for newlines
( in C and perl that means \n, which needs not be \012, BTW . In unix
\n=\012 on disk, on CP/M it's \015\012 and when I worked with Mac (
before the unixy osX they use now ) it was \015, and I cannot think on
what they can use on EBCDIC machines ).

Francisco Olarte.


В списке pgsql-general по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Version management for extensions
Следующее
От: Sven Löschner
Дата:
Сообщение: postgresql 9.4 streaming replication