Обсуждение: Date-Time dangling unit fix

Поиск
Список
Период
Сортировка

Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
Hi all,

Attached is a patch to fix a parsing error for date-time types that
allow dangling units in the input. For example,
`date '1995-08-06 m y d'` was considered a valid date and the dangling
units were ignored.

Intervals also suffer from a similar issue, but the attached patch
doesn't fix that issue. For example,
`interval '1 day second month 6 hours days years ago'` is parsed as a
valid interval with -1 days and -6 hours. I'm hoping to fix that in a
later patch, but it will likely be more complicated than the other
date-time fixes.

- Joe Koshakow

Вложения

Re: Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
On Sun, Dec 11, 2022 at 10:29 AM Joseph Koshakow <koshy44@gmail.com> wrote:
>
> Hi all,
>
> Attached is a patch to fix a parsing error for date-time types that
> allow dangling units in the input. For example,
> `date '1995-08-06 m y d'` was considered a valid date and the dangling
> units were ignored.
>
> Intervals also suffer from a similar issue, but the attached patch
> doesn't fix that issue. For example,
> `interval '1 day second month 6 hours days years ago'` is parsed as a
> valid interval with -1 days and -6 hours. I'm hoping to fix that in a
> later patch, but it will likely be more complicated than the other
> date-time fixes.
>
> - Joe Koshakow

I think I sent that to the wrong email address.

Вложения

Re: Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.

Re: Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow <koshy44@gmail.com> wrote:
>
> I just found another class of this bug that the submitted patch does
> not fix. If the units are at the beginning of the string, then they are
> also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
> think the fix here is to check and make sure that ptype is 0 before
> reassigning the value to a non-zero number. I'll send an updated patch
> with this tonight.

Attached is the described patch.

- Joe Koshakow

Вложения

Re: Date-Time dangling unit fix

От
Tom Lane
Дата:
Joseph Koshakow <koshy44@gmail.com> writes:
> On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow <koshy44@gmail.com> wrote:
>> I just found another class of this bug that the submitted patch does
>> not fix. If the units are at the beginning of the string, then they are
>> also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
>> think the fix here is to check and make sure that ptype is 0 before
>> reassigning the value to a non-zero number. I'll send an updated patch
>> with this tonight.

> Attached is the described patch.

I started to look at this, and soon noticed that while we have test cases
matching this sort of date input, there is no documentation for it.  The
code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
because it looks a lot like the ISO 8601 format for intervals (durations).
But I don't have a copy of ISO 8601, and some googling fails to find any
indication that anybody else believes this is a valid datetime format.
Wikipedia for example documents a lot of variants of ISO 8601 [1],
but nothing that looks like this.

I wonder if we should just rip this code out instead of fixing it.
I suspect its real-world usage is not different from zero.  We'd
have to keep the "Jnnn" Julian-date case, though, so maybe there's
little to be saved.

If we do keep it, there's documentation work to be done.  But the
first bit of doco I'd want to see is a pointer to a standard.

            regards, tom lane

[1] https://en.wikipedia.org/wiki/ISO_8601



Re: Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
On Sat, Mar 4, 2023 at 4:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>    I started to look at this, and soon noticed that while we have test cases
>    matching this sort of date input, there is no documentation for it.  The
>    code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
>    because it looks a lot like the ISO 8601 format for intervals (durations).
>    But I don't have a copy of ISO 8601, and some googling fails to find any
>    indication that anybody else believes this is a valid datetime format.
>    Wikipedia for example documents a lot of variants of ISO 8601 [1],
>    but nothing that looks like this.
>
>    I wonder if we should just rip this code out instead of fixing it.
>    I suspect its real-world usage is not different from zero.  We'd
>    have to keep the "Jnnn" Julian-date case, though, so maybe there's
>    little to be saved.
>
>    If we do keep it, there's documentation work to be done.  But the
>    first bit of doco I'd want to see is a pointer to a standard.

I also don't have a copy of ISO 8601 and wasn't able to find anything
about this variant on Google. I did find this comment in datetime.c

/*
* Was this an "ISO date" with embedded field labels? An
* example is "y2001m02d04" - thomas 2001-02-04
*/

which comes from this commit [1], which was authored by Thomas Lockhart
(presumably the same thomas from the comment). I've CC'ed Thomas in
case the email still exists and they happen to remember. The commit
message mentions ISO, but not the variant mentioned in the comment.
The mailing list thread can be found here [2], but it doesn't provide
much more information. I also found the following thread [3], which
happens to have you in it in case you remember it, which seemed to be
the motivation for commit [1]. It only contains the following line
about ISO:

> o support for "ISO variants" on input, including embedded "T" preceeding
the time fields

All that seems to imply the "y2001m02d04" ISO variant was never really
discussed in much detail and it's probably fine to remove it. Though,
it has been around for 22 years which makes it a bit scary to remove.

- Joe Koshakow

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f58115dddfa8ca63004c4784f57ef660422861d
[2] https://www.postgresql.org/message-id/flat/3BB433D5.3CB4164E%40fourpalms.org
[3] https://www.postgresql.org/message-id/flat/3B970FF8.B9990807%40fourpalms.org#c57d83c80d295bfa19887c92122369c3

Re: Date-Time dangling unit fix

От
Alexander Lakhin
Дата:
Hello,

05.03.2023 02:31, Joseph Koshakow wrote:
I also don't have a copy of ISO 8601 and wasn't able to find anything
about this variant on Google. I did find this comment in datetime.c

/*
* Was this an "ISO date" with embedded field labels? An
* example is "y2001m02d04" - thomas 2001-02-04
*/

which comes from this commit [1], which was authored by Thomas Lockhart
(presumably the same thomas from the comment).

I've also seen another interesting comment in datetime.c:
                /*
                 * Was this an "ISO time" with embedded field labels? An
                 * example is "h04mm05s06" - thomas 2001-02-04
                 */
In fact,
SELECT time 'h04mm05s06';
doesn't work for many years, but
SELECT time 'h04mm05s06.0';
still does.

I've just found that I mentioned it some time ago:
https://www.postgresql.org/message-id/dff75442-2468-f74f-568c-6006e141062f%40gmail.com

Best regards,
Alexander

Re: Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
Attached is a patch for removing the discussed format of date-times.
Вложения

Re: Date-Time dangling unit fix

От
Tom Lane
Дата:
[ I removed Lockhart, because he's taken no part in Postgres work for
  more than twenty years; if that address even still works, you're
  just bugging him ]

Alexander Lakhin <exclusion@gmail.com> writes:
> In fact,
> SELECT time 'h04mm05s06';
> doesn't work for many years, but
> SELECT time 'h04mm05s06.0';
> still does.

I traced that down to this in DecodeTimeOnly:

    if ((fmask & DTK_TIME_M) != DTK_TIME_M)
        return DTERR_BAD_FORMAT;

where we have

#define DTK_ALL_SECS_M    (DTK_M(SECOND) | DTK_M(MILLISECOND) | DTK_M(MICROSECOND))
#define DTK_TIME_M    (DTK_M(HOUR) | DTK_M(MINUTE) | DTK_ALL_SECS_M)

So in other words, this test insists on seeing hour, minute, second,
*and* fractional-second fields.  That seems obviously too picky.
It might not matter if we rip out this syntax, but I see other similar
tests so I suspect some of them will still be reachable.

Personally I'd say that hh:mm is a plenty complete enough time, and
whether you write seconds is optional, let alone fractional seconds.
We do accept this:

=> select '12:34'::time;
   time
----------
 12:34:00
(1 row)

so that must be going through a different code path, which I didn't
try to identify yet.

            regards, tom lane



Re: Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
On Sun, Mar 5, 2023 at 12:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> We do accept this:
>
> => select '12:34'::time;
>    time  
> ----------
>  12:34:00
> (1 row)
>
> so that must be going through a different code path, which I didn't
> try to identify yet.

That query will contain a single field of "12:34" with ftype DTK_TIME.
That will call into DecodeTime(), which calls into DecodeTimeCommon(),
where we have:

*tmask = DTK_TIME_M;

- Joe Koshakow

Re: Date-Time dangling unit fix

От
Joseph Koshakow
Дата:
Also I removed some dead code from the previous patch.

- Joe Koshakow
Вложения

Re: Date-Time dangling unit fix

От
Tom Lane
Дата:
Joseph Koshakow <koshy44@gmail.com> writes:
> Also I removed some dead code from the previous patch.

This needs a rebase over bcc704b52, so I did that and made a
couple of other adjustments.

I'm inclined to think that you removed too much from DecodeTimeOnly.
That does accept date specs (at least for timetz), and I see no very
good reason for it not to accept a Julian date spec.  I also wonder
why you removed the UNITS: case there.  Seems like we want these
functions to accept the same syntax as much as possible.

I think the code is still a bit schizophrenic about placement of
ptype specs.  In the UNITS: case, we don't insist that a unit
apply to exactly the very next field; instead it applies to the next
one where it disambiguates.  So for instead this is accepted:

regression=# select 'J PM 1234567 1:23'::timestamp;
       timestamp        
------------------------
 1333-01-11 13:23:00 BC

That's a little weird, or maybe even a lot weird, but it's not
inherently nonsensical so I'm hesitant to stop accepting it.
However, if UNITS acts that way, then why is ISOTIME different?
So I'm inclined to remove ISOTIME's lookahead check

                        if (i >= nf - 1 ||
                            (ftype[i + 1] != DTK_NUMBER &&
                             ftype[i + 1] != DTK_TIME &&
                             ftype[i + 1] != DTK_DATE))
                            return DTERR_BAD_FORMAT;

and rely on the ptype-still-set error at the bottom of the loop
to complain about nonsensical cases.

Also, if we do keep the lookahead checks, the one in DecodeTimeOnly
could be simplified --- it's accepting some cases that actually
aren't supported there.

            regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index a7558d39a0..13856b06d1 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
     int            fmask = 0,
                 tmask,
                 type;
-    int            ptype = 0;        /* "prefix type" for ISO y2001m02d04 format */
+    int            ptype = 0;        /* "prefix type" for ISO and Julian formats */
     int            i;
     int            val;
     int            dterr;
@@ -1071,6 +1071,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
                     {
                         char       *cp;

+                        /*
+                         * Allow a preceding "t" field, but no other units.
+                         */
                         if (ptype != 0)
                         {
                             /* Sanity check; should not fail this test */
@@ -1175,8 +1178,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
             case DTK_NUMBER:

                 /*
-                 * Was this an "ISO date" with embedded field labels? An
-                 * example is "y2001m02d04" - thomas 2001-02-04
+                 * Deal with cases where previous field labeled this one
                  */
                 if (ptype != 0)
                 {
@@ -1187,85 +1189,11 @@ DecodeDateTime(char **field, int *ftype, int nf,
                     value = strtoint(field[i], &cp, 10);
                     if (errno == ERANGE)
                         return DTERR_FIELD_OVERFLOW;
-
-                    /*
-                     * only a few kinds are allowed to have an embedded
-                     * decimal
-                     */
-                    if (*cp == '.')
-                        switch (ptype)
-                        {
-                            case DTK_JULIAN:
-                            case DTK_TIME:
-                            case DTK_SECOND:
-                                break;
-                            default:
-                                return DTERR_BAD_FORMAT;
-                                break;
-                        }
-                    else if (*cp != '\0')
+                    if (*cp != '.' && *cp != '\0')
                         return DTERR_BAD_FORMAT;

                     switch (ptype)
                     {
-                        case DTK_YEAR:
-                            tm->tm_year = value;
-                            tmask = DTK_M(YEAR);
-                            break;
-
-                        case DTK_MONTH:
-
-                            /*
-                             * already have a month and hour? then assume
-                             * minutes
-                             */
-                            if ((fmask & DTK_M(MONTH)) != 0 &&
-                                (fmask & DTK_M(HOUR)) != 0)
-                            {
-                                tm->tm_min = value;
-                                tmask = DTK_M(MINUTE);
-                            }
-                            else
-                            {
-                                tm->tm_mon = value;
-                                tmask = DTK_M(MONTH);
-                            }
-                            break;
-
-                        case DTK_DAY:
-                            tm->tm_mday = value;
-                            tmask = DTK_M(DAY);
-                            break;
-
-                        case DTK_HOUR:
-                            tm->tm_hour = value;
-                            tmask = DTK_M(HOUR);
-                            break;
-
-                        case DTK_MINUTE:
-                            tm->tm_min = value;
-                            tmask = DTK_M(MINUTE);
-                            break;
-
-                        case DTK_SECOND:
-                            tm->tm_sec = value;
-                            tmask = DTK_M(SECOND);
-                            if (*cp == '.')
-                            {
-                                dterr = ParseFractionalSecond(cp, fsec);
-                                if (dterr)
-                                    return dterr;
-                                tmask = DTK_ALL_SECS_M;
-                            }
-                            break;
-
-                        case DTK_TZ:
-                            tmask = DTK_M(TZ);
-                            dterr = DecodeTimezone(field[i], tzp);
-                            if (dterr)
-                                return dterr;
-                            break;
-
                         case DTK_JULIAN:
                             /* previous field was a label for "julian date" */
                             if (value < 0)
@@ -1519,6 +1447,9 @@ DecodeDateTime(char **field, int *ftype, int nf,

                     case UNITS:
                         tmask = 0;
+                        /* reject consecutive unhandled units */
+                        if (ptype != 0)
+                            return DTERR_BAD_FORMAT;
                         ptype = val;
                         break;

@@ -1576,6 +1507,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
         fmask |= tmask;
     }                            /* end loop over fields */

+    /* reject if prefix type appeared and was never handled */
+    if (ptype != 0)
+        return DTERR_BAD_FORMAT;
+
     /* do additional checking for normal date specs (but not "infinity" etc) */
     if (*dtype == DTK_DATE)
     {
@@ -1943,7 +1878,7 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
     int            fmask = 0,
                 tmask,
                 type;
-    int            ptype = 0;        /* "prefix type" for ISO h04mm05s06 format */
+    int            ptype = 0;        /* "prefix type" for ISO format */
     int            i;
     int            val;
     int            dterr;
@@ -2070,133 +2005,12 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
             case DTK_NUMBER:

                 /*
-                 * Was this an "ISO time" with embedded field labels? An
-                 * example is "h04mm05s06" - thomas 2001-02-04
+                 * Was this an "ISO time", for example "T040506.789"?
                  */
                 if (ptype != 0)
                 {
-                    char       *cp;
-                    int            value;
-
-                    /* Only accept a date under limited circumstances */
                     switch (ptype)
                     {
-                        case DTK_JULIAN:
-                        case DTK_YEAR:
-                        case DTK_MONTH:
-                        case DTK_DAY:
-                            if (tzp == NULL)
-                                return DTERR_BAD_FORMAT;
-                        default:
-                            break;
-                    }
-
-                    errno = 0;
-                    value = strtoint(field[i], &cp, 10);
-                    if (errno == ERANGE)
-                        return DTERR_FIELD_OVERFLOW;
-
-                    /*
-                     * only a few kinds are allowed to have an embedded
-                     * decimal
-                     */
-                    if (*cp == '.')
-                        switch (ptype)
-                        {
-                            case DTK_JULIAN:
-                            case DTK_TIME:
-                            case DTK_SECOND:
-                                break;
-                            default:
-                                return DTERR_BAD_FORMAT;
-                                break;
-                        }
-                    else if (*cp != '\0')
-                        return DTERR_BAD_FORMAT;
-
-                    switch (ptype)
-                    {
-                        case DTK_YEAR:
-                            tm->tm_year = value;
-                            tmask = DTK_M(YEAR);
-                            break;
-
-                        case DTK_MONTH:
-
-                            /*
-                             * already have a month and hour? then assume
-                             * minutes
-                             */
-                            if ((fmask & DTK_M(MONTH)) != 0 &&
-                                (fmask & DTK_M(HOUR)) != 0)
-                            {
-                                tm->tm_min = value;
-                                tmask = DTK_M(MINUTE);
-                            }
-                            else
-                            {
-                                tm->tm_mon = value;
-                                tmask = DTK_M(MONTH);
-                            }
-                            break;
-
-                        case DTK_DAY:
-                            tm->tm_mday = value;
-                            tmask = DTK_M(DAY);
-                            break;
-
-                        case DTK_HOUR:
-                            tm->tm_hour = value;
-                            tmask = DTK_M(HOUR);
-                            break;
-
-                        case DTK_MINUTE:
-                            tm->tm_min = value;
-                            tmask = DTK_M(MINUTE);
-                            break;
-
-                        case DTK_SECOND:
-                            tm->tm_sec = value;
-                            tmask = DTK_M(SECOND);
-                            if (*cp == '.')
-                            {
-                                dterr = ParseFractionalSecond(cp, fsec);
-                                if (dterr)
-                                    return dterr;
-                                tmask = DTK_ALL_SECS_M;
-                            }
-                            break;
-
-                        case DTK_TZ:
-                            tmask = DTK_M(TZ);
-                            dterr = DecodeTimezone(field[i], tzp);
-                            if (dterr)
-                                return dterr;
-                            break;
-
-                        case DTK_JULIAN:
-                            /* previous field was a label for "julian date" */
-                            if (value < 0)
-                                return DTERR_FIELD_OVERFLOW;
-                            tmask = DTK_DATE_M;
-                            j2date(value, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
-                            isjulian = true;
-
-                            if (*cp == '.')
-                            {
-                                double        time;
-
-                                dterr = ParseFraction(cp, &time);
-                                if (dterr)
-                                    return dterr;
-                                time *= USECS_PER_DAY;
-                                dt2time(time,
-                                        &tm->tm_hour, &tm->tm_min,
-                                        &tm->tm_sec, fsec);
-                                tmask |= DTK_TIME_M;
-                            }
-                            break;
-
                         case DTK_TIME:
                             /* previous field was "t" for ISO time */
                             dterr = DecodeNumberField(strlen(field[i]), field[i],
@@ -2376,11 +2190,6 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
                         bc = (val == BC);
                         break;

-                    case UNITS:
-                        tmask = 0;
-                        ptype = val;
-                        break;
-
                     case ISOTIME:
                         tmask = 0;

@@ -2426,6 +2235,10 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
         fmask |= tmask;
     }                            /* end loop over fields */

+    /* reject if prefix type appeared and was never handled */
+    if (ptype != 0)
+        return DTERR_BAD_FORMAT;
+
     /* do final checking/adjustment of Y/M/D fields */
     dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
     if (dterr)
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 4f01131077..d87d0daad8 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -97,30 +97,6 @@ SELECT timestamp with time zone '27/12/2001 04:05:06.789-08';
 (1 row)

 reset datestyle;
-SELECT timestamp with time zone 'Y2001M12D27H04M05S06.789+08';
-           timestamptz
-----------------------------------
- Wed Dec 26 12:05:06.789 2001 PST
-(1 row)
-
-SELECT timestamp with time zone 'Y2001M12D27H04M05S06.789-08';
-           timestamptz
-----------------------------------
- Thu Dec 27 04:05:06.789 2001 PST
-(1 row)
-
-SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789+08';
-           timestamptz
-----------------------------------
- Wed Dec 26 12:05:06.789 2001 PST
-(1 row)
-
-SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789-08';
-           timestamptz
-----------------------------------
- Thu Dec 27 04:05:06.789 2001 PST
-(1 row)
-
 SELECT timestamp with time zone 'J2452271+08';
          timestamptz
 ------------------------------
@@ -283,6 +259,25 @@ SELECT date 'J0' AS "Julian Epoch";
  11-24-4714 BC
 (1 row)

+-- test error on dangling Julian units
+SELECT date '1995-08-06  J J J';
+ERROR:  invalid input syntax for type date: "1995-08-06  J J J"
+LINE 1: SELECT date '1995-08-06  J J J';
+                    ^
+SELECT date 'J J 1520447';
+ERROR:  invalid input syntax for type date: "J J 1520447"
+LINE 1: SELECT date 'J J 1520447';
+                    ^
+-- We used to accept this input style, but it was based on a misreading
+-- of ISO8601, and it was never documented anyway
+SELECT timestamp with time zone 'Y2001M12D27H04M05S06.789+08';
+ERROR:  invalid input syntax for type timestamp with time zone: "Y2001M12D27H04M05S06.789+08"
+LINE 1: SELECT timestamp with time zone 'Y2001M12D27H04M05S06.789+08...
+                                        ^
+SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789-08';
+ERROR:  invalid input syntax for type timestamp with time zone: "Y2001M12D27H04MM05S06.789-08"
+LINE 1: SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789-0...
+                                        ^
 -- conflicting fields should throw errors
 SELECT date '1995-08-06 epoch';
 ERROR:  invalid input syntax for type date: "1995-08-06 epoch"
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 0676cac5d1..474f92dcd9 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -25,10 +25,6 @@ SELECT timestamp with time zone '27/12/2001 04:05:06.789-08';
 set datestyle to dmy;
 SELECT timestamp with time zone '27/12/2001 04:05:06.789-08';
 reset datestyle;
-SELECT timestamp with time zone 'Y2001M12D27H04M05S06.789+08';
-SELECT timestamp with time zone 'Y2001M12D27H04M05S06.789-08';
-SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789+08';
-SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789-08';
 SELECT timestamp with time zone 'J2452271+08';
 SELECT timestamp with time zone 'J2452271-08';
 SELECT timestamp with time zone 'J2452271.5+08';
@@ -62,6 +58,15 @@ SET DateStyle = 'Postgres, MDY';
 SELECT date 'J1520447' AS "Confucius' Birthday";
 SELECT date 'J0' AS "Julian Epoch";

+-- test error on dangling Julian units
+SELECT date '1995-08-06  J J J';
+SELECT date 'J J 1520447';
+
+-- We used to accept this input style, but it was based on a misreading
+-- of ISO8601, and it was never documented anyway
+SELECT timestamp with time zone 'Y2001M12D27H04M05S06.789+08';
+SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789-08';
+
 -- conflicting fields should throw errors
 SELECT date '1995-08-06 epoch';
 SELECT date '1995-08-06 infinity';

Re: Date-Time dangling unit fix

От
Alexander Lakhin
Дата:
10.03.2023 03:26, Tom Lane wrote:
> Joseph Koshakow<koshy44@gmail.com>  writes:
>> Also I removed some dead code from the previous patch.
> That's a little weird, or maybe even a lot weird, but it's not
> inherently nonsensical so I'm hesitant to stop accepting it.
> However, if UNITS acts that way, then why is ISOTIME different?
> So I'm inclined to remove ISOTIME's lookahead check
>
>                          if (i >= nf - 1 ||
>                              (ftype[i + 1] != DTK_NUMBER &&
>                               ftype[i + 1] != DTK_TIME &&
>                               ftype[i + 1] != DTK_DATE))
>                              return DTERR_BAD_FORMAT;
>
> and rely on the ptype-still-set error at the bottom of the loop
> to complain about nonsensical cases.

I also wonder how the units affect time zone parsing.
With the patch:
SELECT time with time zone '010203m+3';
ERROR:  invalid input syntax for type time with time zone: "010203m+3"
But without the patch:
SELECT time with time zone '010203m+3';
  01:02:03+03

Though with "non-unit" spec:
SELECT time with time zone '010203mmm+3';
  01:02:03-03
(With or without the patch.)
It seems like "units" were just ignored in a time zone specification,
but now they are rejected.

At the same time, I see that the time zone specification allows for any
letters with the +/- sign following:
SELECT time with time zone '010203anyletters+3';
  01:02:03-03

It's definitely a separate issue, I just want to note a new erroneous
condition.

Best regards,
Alexander



Re: Date-Time dangling unit fix

От
Peter Eisentraut
Дата:
On 04.03.23 22:05, Tom Lane wrote:
> Joseph Koshakow<koshy44@gmail.com>  writes:
>> On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow<koshy44@gmail.com>  wrote:
>>> I just found another class of this bug that the submitted patch does
>>> not fix. If the units are at the beginning of the string, then they are
>>> also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
>>> think the fix here is to check and make sure that ptype is 0 before
>>> reassigning the value to a non-zero number. I'll send an updated patch
>>> with this tonight.
>> Attached is the described patch.
> I started to look at this, and soon noticed that while we have test cases
> matching this sort of date input, there is no documentation for it.  The
> code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
> because it looks a lot like the ISO 8601 format for intervals (durations).
> But I don't have a copy of ISO 8601, and some googling fails to find any
> indication that anybody else believes this is a valid datetime format.
> Wikipedia for example documents a lot of variants of ISO 8601 [1],
> but nothing that looks like this.

There are additional formats in (the lesser known) ISO 8601-2, one of 
which looks like this:

     '1985Y4M12D', calendar year 1985, April 12th

But that is entirely incompatible with the above example, because it has 
the units after the numbers.

Even more reason not to support the earlier example.




Re: Date-Time dangling unit fix

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> I also wonder how the units affect time zone parsing.
> With the patch:
> SELECT time with time zone '010203m+3';
> ERROR:  invalid input syntax for type time with time zone: "010203m+3"
> But without the patch:
> SELECT time with time zone '010203m+3';
>   01:02:03+03

Yeah, I think this is the ptype-still-set-at-end-of-loop check.
I'm fine with throwing an error for that.

> Though with "non-unit" spec:
> SELECT time with time zone '010203mmm+3';
>   01:02:03-03
> (With or without the patch.)
> It seems like "units" were just ignored in a time zone specification,
> but now they are rejected.

I think it's reading "mmm+3" as a POSIX timezone spec.  From memory,
POSIX allows any sequence of 3 or more letters as a zone abbreviation.
It looks like we're being lax and not enforcing the "3 or more" part:

regression=# set time zone 'foobar+3';
SET
regression=# select timeofday();
               timeofday                
----------------------------------------
 Fri Mar 10 12:08:24.484853 2023 FOOBAR
(1 row)

regression=# set time zone 'fo+3';
SET
regression=# select timeofday();
             timeofday              
------------------------------------
 Fri Mar 10 12:08:38.207311 2023 FO
(1 row)

            regards, tom lane



Re: Date-Time dangling unit fix

От
Tom Lane
Дата:
Hearing no further comments on this, I adjusted DecodeTimeOnly to
look more like DecodeDateTime as I recommended upthread, and pushed.

            regards, tom lane