Обсуждение: Strange behavior in generate_series(date, date, interval) with DST

Поиск
Список
Период
Сортировка

Strange behavior in generate_series(date, date, interval) with DST

От
Sérgio Saquetim
Дата:
I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the bounds. So naturally I've tried something like the query below
​. 

The real query uses generate_series to join other tables and is much more complicated, but for the sake of brevity, I think that this query is good enough to show the problem.
:

postgres=# SELECT generate_series('2014-10-10'::DATE,
​​
'2014-10-15'::DATE, '1 DAY'::INTERVAL);
    generate_series     
------------------------
 2014-10-10 00:00:00-03
 2014-10-11 00:00:00-03
 2014-10-12 00:00:00-03
 2014-10-13 00:00:00-03
 2014-10-14 00:00:00-03
 2014-10-15 00:00:00-03
​​
(6 rows)

​Please note that the upper bound ​
'2014-10-15' is included in the resulting rows.

Now if I try this same query with slightly different dates I get:

​postgres=# SELECT generate_series('2014-10-15'::DATE, '2014-10-20'::DATE, '1 DAY'::INTERVAL);
    generate_series     
------------------------
 2014-10-15 00:00:00-03
 2014-10-16 00:00:00-03
 2014-10-17 00:00:00-03
 2014-10-18 00:00:00-03
 2014-10-19 01:00:00-02
(5 rows)

-----------------------------------------------
The upper bound is not included in the results!
-----------------------------------------------

Here, in Brazil our DST started on Oct 19. So if I had to guess I would say that this strange behavior is due to the DST, but I'm having a hard time to understand why this is happening!

Is this expected behavior?

I know that I can achieve the results I expect with the following query:

postgres=# WITH RECURSIVE days(d) AS (
    SELECT '2014-10-15'::DATE
  UNION ALL
    SELECT d+1 FROM days WHERE d < '2014-10-20'::DATE
)
SELECT * FROM days;
     d      
------------
 2014-10-15
 2014-10-16
 2014-10-17
 2014-10-18
 2014-10-19
 2014-10-20
(6 rows)

But using that instead of generate_series, just feels wrong so I would like to understand what's happening and if there is a way to overcome that, before changing my queries.

Thank you!

Sérgio Saquetim

Re: Strange behavior in generate_series(date, date, interval) with DST

От
Adrian Klaver
Дата:
On 12/07/2014 12:11 PM, Sérgio Saquetim wrote:
> I've noticed a strange behavior in the generate_series functions.
>
> I'm trying to get all days between a start and an end date including the
> bounds. So naturally I've tried something like the query below
> ​.
>
> The real query uses generate_series to join other tables and is much
> more complicated, but for the sake of brevity, I think that this query
> is good enough to show the problem.
> :
>
> postgres=# SELECT generate_series('2014-10-10'::DATE,
> ​​
> '2014-10-15'::DATE, '1 DAY'::INTERVAL);
>      generate_series
> ------------------------
>   2014-10-10 00:00:00-03
>   2014-10-11 00:00:00-03
>   2014-10-12 00:00:00-03
>   2014-10-13 00:00:00-03
>   2014-10-14 00:00:00-03
>   2014-10-15 00:00:00-03
> ​​
> (6 rows)
>
> ​Please note that the upper bound ​
> ​
> '2014-10-15' is included in the resulting rows.
>
> Now if I try this same query with slightly different dates I get:
>
> ​postgres=# SELECT generate_series('2014-10-15'::DATE,
> '2014-10-20'::DATE, '1 DAY'::INTERVAL);
>      generate_series
> ------------------------
>   2014-10-15 00:00:00-03
>   2014-10-16 00:00:00-03
>   2014-10-17 00:00:00-03
>   2014-10-18 00:00:00-03
>   2014-10-19 01:00:00-02
> (5 rows)
>
> -----------------------------------------------
> The upper bound is not included in the results!
> -----------------------------------------------
>
> Here, in Brazil our DST started on Oct 19. So if I had to guess I would
> say that this strange behavior is due to the DST, but I'm having a hard
> time to understand why this is happening!
>
> Is this expected behavior?
>
> I know that I can achieve the results I expect with the following query:
>
> postgres=# WITH RECURSIVE days(d) AS (
>      SELECT '2014-10-15'::DATE
>    UNION ALL
>      SELECT d+1 FROM days WHERE d < '2014-10-20'::DATE
> )
> SELECT * FROM days;
>       d
> ------------
>   2014-10-15
>   2014-10-16
>   2014-10-17
>   2014-10-18
>   2014-10-19
>   2014-10-20
> (6 rows)
>
> But using that instead of generate_series, just feels wrong so I would
> like to understand what's happening and if there is a way to overcome
> that, before changing my queries.

First generate_series is really looking for a timestamp on input and
returns timestamps:

http://www.postgresql.org/docs/9.3/interactive/functions-srf.html

generate_series(start, stop, step interval)     timestamp or timestamp with
time zone     setof timestamp or setof timestamp with time zone (same as
argument type)     Generate a series of values, from start to stop with a
step size of step

So:

test=# set timezone='Brazil/East';
SET

test=# SELECT generate_series('2014-10-10'::TIMESTAMP,
'2014-10-20'::TIMESTAMP, '1 DAY'::INTERVAL);
    generate_series
---------------------
  2014-10-10 00:00:00
  2014-10-11 00:00:00
  2014-10-12 00:00:00
  2014-10-13 00:00:00
  2014-10-14 00:00:00
  2014-10-15 00:00:00
  2014-10-16 00:00:00
  2014-10-17 00:00:00
  2014-10-18 00:00:00
  2014-10-19 00:00:00
  2014-10-20 00:00:00

test=# SELECT generate_series('2014-10-10'::TIMESTAMP WITH TIME ZONE,
'2014-10-20'::TIMESTAMP WITH TIME ZONE, '1 DAY'::INTERVAL);
     generate_series
------------------------
  2014-10-10 00:00:00-03
  2014-10-11 00:00:00-03
  2014-10-12 00:00:00-03
  2014-10-13 00:00:00-03
  2014-10-14 00:00:00-03
  2014-10-15 00:00:00-03
  2014-10-16 00:00:00-03
  2014-10-17 00:00:00-03
  2014-10-18 00:00:00-03
  2014-10-19 01:00:00-02


Though this part I do not understand:

test=# select '2014-10-19'::timestamp with time zone;
       timestamptz
------------------------
  2014-10-19 01:00:00-02
(1 row)

test=# select '2014-10-20'::timestamp with time zone;
       timestamptz
------------------------
  2014-10-20 00:00:00-02
(1 row)

I thought interval understood 23/25 'day' across DST/ST boundaries. I
will have to think more on this.

>
> Thank you!
>
> Sérgio Saquetim
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Strange behavior in generate_series(date, date, interval) with DST

От
Adrian Klaver
Дата:
On 12/07/2014 12:11 PM, Sérgio Saquetim wrote:
> I've noticed a strange behavior in the generate_series functions.
>
> I'm trying to get all days between a start and an end date including the
> bounds. So naturally I've tried something like the query below
> ​.
>
> The real query uses generate_series to join other tables and is much
> more complicated, but for the sake of brevity, I think that this query
> is good enough to show the problem.

>
> But using that instead of generate_series, just feels wrong so I would
> like to understand what's happening and if there is a way to overcome
> that, before changing my queries.

To follow up, it looks to be a Midnight issue. I live on the US West
Coast so:

test=# show timezone;
   TimeZone
------------
  US/Pacific
(1 row)

Our Spring change happened  March 9th at 2:00 AM:

test=# select '2014-03-09 01:00'::timestamp with time zone ;
       timestamptz
------------------------
  2014-03-09 01:00:00-08
(1 row)

test=# select '2014-03-09 02:00'::timestamp with time zone ;
       timestamptz
------------------------
  2014-03-09 03:00:00-07
(1 row)


When I do a similar generate_series:

test=# SELECT generate_series('2014-03-01'::DATE , '2014-03-10'::DATE ,
'1 DAY'::INTERVAL);
     generate_series
------------------------
  2014-03-01 00:00:00-08
  2014-03-02 00:00:00-08
  2014-03-03 00:00:00-08
  2014-03-04 00:00:00-08
  2014-03-05 00:00:00-08
  2014-03-06 00:00:00-08
  2014-03-07 00:00:00-08
  2014-03-08 00:00:00-08
  2014-03-09 00:00:00-08
  2014-03-10 00:00:00-07
(10 rows)


it works.

So it seems there is some confusion which Midnight is being used for the
DATE to timestamp with time zone conversion.



>
> Thank you!
>
> Sérgio Saquetim
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Strange behavior in generate_series(date, date, interval) with DST

От
Sérgio Saquetim
Дата:
You've nailed it, thank you! 

Finally I'm understanding what's going on.

I wasn't paying attention to the fact that generate_series really expects for timezone inputs. So when I was passing the upper bound as '2014-10-20'::DATE, the value was being cast to 2014-10-20 00:00:00-02.

postgres=# SELECT '2014-10-20'::TIMESTAMPTZ;
      timestamptz       
------------------------
 2014-10-20 00:00:00-02
(1 row)

But after the DST change the generate_series changes the hour in the generated values as in 2014-10-20 01:00:00-02, which is bigger than 2014-10-20 00:00:00-02 and because of that it's not returned. 

Using a larger upper bound solved my problem.

postgres=# SELECT generate_series('2014-10-15 00:00:00'::TIMESTAMPTZ, '2014-10-20 23:59:59'::TIMESTAMPTZ, '1 DAY'::INTERVAL);
    generate_series     
------------------------
 2014-10-15 00:00:00-03
 2014-10-16 00:00:00-03
 2014-10-17 00:00:00-03
 2014-10-18 00:00:00-03
 2014-10-19 01:00:00-02
 2014-10-20 01:00:00-02
(6 rows)

Thank you again!

Sérgio Saquetim

2014-12-07 20:04 GMT-02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 12/07/2014 12:11 PM, Sérgio Saquetim wrote:
I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the
bounds. So naturally I've tried something like the query below
​.

The real query uses generate_series to join other tables and is much
more complicated, but for the sake of brevity, I think that this query
is good enough to show the problem.


But using that instead of generate_series, just feels wrong so I would
like to understand what's happening and if there is a way to overcome
that, before changing my queries.

To follow up, it looks to be a Midnight issue. I live on the US West Coast so:

test=# show timezone;
  TimeZone
------------
 US/Pacific
(1 row)

Our Spring change happened  March 9th at 2:00 AM:

test=# select '2014-03-09 01:00'::timestamp with time zone ;
      timestamptz
------------------------
 2014-03-09 01:00:00-08
(1 row)

test=# select '2014-03-09 02:00'::timestamp with time zone ;
      timestamptz
------------------------
 2014-03-09 03:00:00-07
(1 row)


When I do a similar generate_series:

test=# SELECT generate_series('2014-03-01'::DATE , '2014-03-10'::DATE , '1 DAY'::INTERVAL);
    generate_series
------------------------
 2014-03-01 00:00:00-08
 2014-03-02 00:00:00-08
 2014-03-03 00:00:00-08
 2014-03-04 00:00:00-08
 2014-03-05 00:00:00-08
 2014-03-06 00:00:00-08
 2014-03-07 00:00:00-08
 2014-03-08 00:00:00-08
 2014-03-09 00:00:00-08
 2014-03-10 00:00:00-07
(10 rows)


it works.

So it seems there is some confusion which Midnight is being used for the DATE to timestamp with time zone conversion.





Thank you!

Sérgio Saquetim



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Strange behavior in generate_series(date, date, interval) with DST

От
Andrew Sullivan
Дата:
On Sun, Dec 07, 2014 at 08:25:48PM -0200, Sérgio Saquetim wrote:
>
> I wasn't paying attention to the fact that generate_series really expects
> for timezone inputs. So when I was passing the upper bound
> as '2014-10-20'::DATE, the value was being cast to 2014-10-20 00:00:00-02.
>
> postgres=# SELECT '2014-10-20'::TIMESTAMPTZ;
>       timestamptz
> ------------------------
>  2014-10-20 00:00:00-02
> (1 row)

[…]

> Using a larger upper bound solved my problem.

As a more general lesson, I'd suggest that when you're working with
dates your best bet is to do it with your time zone as UTC.  If you
then want to format the output in the local time zone, you can do that
in the outer SELECT with AT TIME ZONE.  This isn't because Postgres is
going to get this wrong, but because it's far too easy to confuse
yourself with those time changes.  It makes debugging easier,
particularly because the time change only happens twice a year so
nobody _ever_ thinks of it when troubleshooting.

Best regards,

A

--
Andrew Sullivan
ajs@crankycanuck.ca


Re: Strange behavior in generate_series(date, date, interval) with DST

От
Francisco Olarte
Дата:
Hi Sérgio:

On Sun, Dec 7, 2014 at 9:11 PM, Sérgio Saquetim <sergiosaquetim@gmail.com> wrote:
I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the bounds. So naturally I've tried something like the query below
​. 

As both your examples and previous responses highlight your problem is that generate_series is not defined for dates, I'll avoid further comments on this. As you are using dates and your working example is using dates I would suggest rewriting your query around to avoid implicits cast problems. Date difference is integer, generate_series is defined on them, so:


=> select generate_series(0,'2014-10-20'::date - '2014-10-10'::date);
 generate_series
-----------------
               0
               1
               2
               3
               4
               5
               6
               7
               8
               9
              10
(11 rows)

cdrs=> select '2014-10-10'::date+ generate_series(0,'2014-10-20'::date - '2014-10-10'::date);
  ?column? 
------------
 2014-10-10
 2014-10-11
 2014-10-12
 2014-10-13
 2014-10-14
 2014-10-15
 2014-10-16
 2014-10-17
 2014-10-18
 2014-10-19
 2014-10-20
(11 rows)

Or even

=> select ini + generate_series(0, fin-ini) from (select '2014-10-10'::date as ini, '2014-10-20'::date as fin) data ;
  ?column? 
------------
 2014-10-10
 2014-10-11
 2014-10-12
 2014-10-13
 2014-10-14
 2014-10-15
 2014-10-16
 2014-10-17
 2014-10-18
 2014-10-19
 2014-10-20
(11 rows)


Regards
   Francisco Olarte.