Обсуждение: Extract epoch from Interval weird behavior

Поиск
Список
Период
Сортировка

Extract epoch from Interval weird behavior

От
Joseph Koshakow
Дата:
Hi all,

I noticed something odd when going through some
of the Interval code. The DAYS_PER_YEAR constant
is defined in src/include/datatype/timestamp.h.
> #define DAYS_PER_YEAR    365.25    /* assumes leap year every four years */

We execute the EXTRACT and date_part functions in
src/backend/utils/adt/timestamp.c in
> static Datum
> interval_part_common(PG_FUNCTION_ARGS, bool retnumeric)

When executing date_part we multiply the total
years in the Interval by DAYS_PER_YEAR
> result += ((double) DAYS_PER_YEAR * SECS_PER_DAY) * (interval->month / MONTHS_PER_YEAR);
> result += ((double) DAYS_PER_MONTH * SECS_PER_DAY) * (interval->month % MONTHS_PER_YEAR);
> result += ((double) SECS_PER_DAY) * interval->day;


However when executing EXTRACT we first truncate
DAYS_PER_YEAR to an integer, and then multiply it
by the total years in the Interval
/* this always fits into int64 */
> secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
>                          (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
>                           interval->day) * SECS_PER_DAY;

Is this truncation on purpose? It seems like
EXTRACT is not accounting for leap years in
it's calculation.

- Joe Koshakow



Re: Extract epoch from Interval weird behavior

От
Joseph Koshakow
Дата:
On Wed, Feb 23, 2022 at 7:42 PM Joseph Koshakow <koshy44@gmail.com> wrote:
>
> Hi all,
>
> I noticed something odd when going through some
> of the Interval code. The DAYS_PER_YEAR constant
> is defined in src/include/datatype/timestamp.h.
> > #define DAYS_PER_YEAR    365.25    /* assumes leap year every four years */
>
> We execute the EXTRACT and date_part functions in
> src/backend/utils/adt/timestamp.c in
> > static Datum
> > interval_part_common(PG_FUNCTION_ARGS, bool retnumeric)
>
> When executing date_part we multiply the total
> years in the Interval by DAYS_PER_YEAR
> > result += ((double) DAYS_PER_YEAR * SECS_PER_DAY) * (interval->month / MONTHS_PER_YEAR);
> > result += ((double) DAYS_PER_MONTH * SECS_PER_DAY) * (interval->month % MONTHS_PER_YEAR);
> > result += ((double) SECS_PER_DAY) * interval->day;
>
>
> However when executing EXTRACT we first truncate
> DAYS_PER_YEAR to an integer, and then multiply it
> by the total years in the Interval
> /* this always fits into int64 */
> > secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
> >                          (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
> >                           interval->day) * SECS_PER_DAY;
>
> Is this truncation on purpose? It seems like
> EXTRACT is not accounting for leap years in
> it's calculation.
>
> - Joe Koshakow

Oops I sent that to the wrong email. If this isn't intented I've created a patch
that fixes it, with the following two open questions
 * DAYS_PER_YEAR_NUM is recalculated every time. Is there anyway
to convert a float directly to a numeric to avoid this?
 * For some reason the change adds a lot of trailing zeros to the result. I'm
not sure why that is.

- Joe Koshakow

Вложения

Re: Extract epoch from Interval weird behavior

От
Aleksander Alekseev
Дата:
Hi Joseph,

> > Is this truncation on purpose? It seems like
> > EXTRACT is not accounting for leap years in
> > it's calculation.

Extracting an epoch from an interval is quite a strange case since intervals are not connected to any specific dates.

For instance:

select extract('epoch' from interval '1 month')

.. returns 2592000 = 30*24*60*60. But what if the month is February? Should we account for the different number of days for intervals like 6 months or 24 months?

Also, leap years don't just happen every 4 years. Here is the actual logic:

bool is_leap_year(int Y) {
  if(Y % 400 == 0) return true;
  else if(Y % 100 == 0) return false;
  else if(Y % 4 == 0) return true;
  else return false;
}

And what about leap seconds?

All in all, I don't think that the benefit of the proposed change outweighs the fact that it will break the previous behavior for the users who may rely on it. I suggest keeping it simple, i.e. the way it is now. What I think we could do instead is explicitly document this behavior in [1].

[1]: https://www.postgresql.org/docs/current/functions-datetime.html

--
Best regards,
Aleksander Alekseev

Re: Extract epoch from Interval weird behavior

От
Joseph Koshakow
Дата:
On Thu, Feb 24, 2022 at 4:47 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
> Extracting an epoch from an interval is quite a strange case since intervals are not connected to any specific dates.

I agree, I think it's a weird use case and that it's probably not
worth fixing. Though it was fun for me to try.

>
> All in all, I don't think that the benefit of the proposed change outweighs the fact that it will break the previous
behaviorfor the users who may rely on it. I suggest keeping it simple, i.e. the way it is now. What I think we could do
insteadis explicitly document this behavior in [1]. 
>
> [1]: https://www.postgresql.org/docs/current/functions-datetime.html

I do want to briefly mention, if I'm understanding the history of
EXTRACT correctly, that the previous behavior
actually was to multiply by 365.25, not 365. However The commit that
changed the return type from numeric [1]
changed that behavior. Looking through the discussions [2], I don't
see any mention of it, which makes me think
it was a mistake. However there is a lot of discussion around numeric
performance and being able to optimize
numeric division because every divisor was a power of 10. Fixing this
issue would break that assumption and
cause some performance degradations which probably isn't worth it.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2da77cdb4661826482ebf2ddba1f953bc74afe4
[2]: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu

- Joe Koshakow



Re: Extract epoch from Interval weird behavior

От
Tom Lane
Дата:
Joseph Koshakow <koshy44@gmail.com> writes:
> I do want to briefly mention, if I'm understanding the history of
> EXTRACT correctly, that the previous behavior
> actually was to multiply by 365.25, not 365. However The commit that
> changed the return type from numeric [1]
> changed that behavior. Looking through the discussions [2], I don't
> see any mention of it, which makes me think
> it was a mistake.

Indeed ... Peter?

            regards, tom lane



Re: Extract epoch from Interval weird behavior

От
Peter Eisentraut
Дата:
On 24.02.22 03:35, Joseph Koshakow wrote:
>> However when executing EXTRACT we first truncate
>> DAYS_PER_YEAR to an integer, and then multiply it
>> by the total years in the Interval
>> /* this always fits into int64 */
>>> secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
>>>                           (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
>>>                            interval->day) * SECS_PER_DAY;
>> Is this truncation on purpose? It seems like
>> EXTRACT is not accounting for leap years in
>> it's calculation.

This was not intentional.  The cast is only to make the multiplication 
happen in int64; it didn't mean to drop any fractional parts.

> Oops I sent that to the wrong email. If this isn't intented I've created a patch
> that fixes it, with the following two open questions
>   * DAYS_PER_YEAR_NUM is recalculated every time. Is there anyway
> to convert a float directly to a numeric to avoid this?

We really wanted to avoid doing calculations in numeric as much as 
possible.  So we should figure out a different way to write this.  The 
attached patch works for me.  It's a bit ugly since it hardcodes some 
factors.  Maybe we can rephrase it a bit more elegantly.
Вложения

Re: Extract epoch from Interval weird behavior

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> We really wanted to avoid doing calculations in numeric as much as 
> possible.  So we should figure out a different way to write this.  The 
> attached patch works for me.  It's a bit ugly since it hardcodes some 
> factors.  Maybe we can rephrase it a bit more elegantly.

I think it's fine but needs some commentary.  Maybe about like
"To do this calculation in integer arithmetic even though
DAYS_PER_YEAR is fractional, multiply everything by 4
and then divide by 4 again at the end.  This relies on
DAYS_PER_YEAR being a multiple of 0.25 and on SECS_PER_DAY
being a multiple of 4."

BTW, it might be good to parenthesize as

(... big calculation ...) * (SECS_PER_DAY/4)

to eliminate any question of whether the value could overflow
before the final division by 4.

            regards, tom lane



Re: Extract epoch from Interval weird behavior

От
Peter Eisentraut
Дата:
On 08.04.22 15:10, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> We really wanted to avoid doing calculations in numeric as much as
>> possible.  So we should figure out a different way to write this.  The
>> attached patch works for me.  It's a bit ugly since it hardcodes some
>> factors.  Maybe we can rephrase it a bit more elegantly.
> 
> I think it's fine but needs some commentary.  Maybe about like
> "To do this calculation in integer arithmetic even though
> DAYS_PER_YEAR is fractional, multiply everything by 4
> and then divide by 4 again at the end.  This relies on
> DAYS_PER_YEAR being a multiple of 0.25 and on SECS_PER_DAY
> being a multiple of 4."
> 
> BTW, it might be good to parenthesize as
> 
> (... big calculation ...) * (SECS_PER_DAY/4)
> 
> to eliminate any question of whether the value could overflow
> before the final division by 4.

done that way