Обсуждение: min() and NaN

Поиск
Список
Период
Сортировка

min() and NaN

От
"Michael S. Tibbetts"
Дата:
Hi,

I have a table containing a double precision column. That column 
contains at least one judiciously placed NaN.

I'd expect the aggregate function min() to return the minimum, valid 
numeric value.  Instead, it seems to return the minimum value from the 
subset of rows following the 'NaN'.

What's going here?  What should I expect the aggregate function min() to 
return in this case?  And why?

Any help is appreciated,
Mike


testdb=> \d min_with_nan      Table "public.min_with_nan" Column |       Type       | Modifiers
--------+------------------+----------- col1   | double precision |

testdb=> select * from min_with_nan ; col1
------- 3.141 2.718   NaN    10
(4 rows)

testdb=> select min(col1) from min_with_nan ; min
-----  10
(1 row)

testdb=> select min(col1) from min_with_nan where col1 != 'NaN';  min
------- 2.718
(1 row)



Re: min() and NaN

От
Tom Lane
Дата:
"Michael S. Tibbetts" <mtibbetts@head-cfa.cfa.harvard.edu> writes:
> I'd expect the aggregate function min() to return the minimum, valid 
> numeric value.  Instead, it seems to return the minimum value from the 
> subset of rows following the 'NaN'.

Not real surprising given than min() is implemented with float8smaller,
which does this:
result = ((arg1 > arg2) ? arg1 : arg2);

In most C implementations, any comparison involving a NaN will return
"false".  So when we hit the NaN, we have arg1 = min so far, arg2 = NaN,
comparison yields false, result is NaN.  On the next row, we have
arg1 = NaN, arg2 = next value, comparison yields false, result is next
value; and away it goes.

We could probably make it work the way you want with explicit tests for
NaN in float8smaller, arranged to make sure that the result is not NaN
unless both inputs are NaN.  But I'm not entirely convinced that we
should make it work like that.  The other float8 comparison operators
are designed to treat NaN as larger than every other float8 value (so
that it has a well-defined position when sorting), and I'm inclined to
think that float8smaller and float8larger probably should behave
likewise.  (That actually is the same as what you want for MIN(), but
not for MAX() ...)

Comments anyone?
        regards, tom lane


Re: min() and NaN

От
Stephan Szabo
Дата:
On Sun, 20 Jul 2003, Tom Lane wrote:

> "Michael S. Tibbetts" <mtibbetts@head-cfa.cfa.harvard.edu> writes:
> > I'd expect the aggregate function min() to return the minimum, valid
> > numeric value.  Instead, it seems to return the minimum value from the
> > subset of rows following the 'NaN'.
>
> Not real surprising given than min() is implemented with float8smaller,
> which does this:
>
>     result = ((arg1 > arg2) ? arg1 : arg2);
>
> In most C implementations, any comparison involving a NaN will return
> "false".  So when we hit the NaN, we have arg1 = min so far, arg2 = NaN,
> comparison yields false, result is NaN.  On the next row, we have
> arg1 = NaN, arg2 = next value, comparison yields false, result is next
> value; and away it goes.
>
> We could probably make it work the way you want with explicit tests for
> NaN in float8smaller, arranged to make sure that the result is not NaN
> unless both inputs are NaN.  But I'm not entirely convinced that we
> should make it work like that.  The other float8 comparison operators
> are designed to treat NaN as larger than every other float8 value (so
> that it has a well-defined position when sorting), and I'm inclined to
> think that float8smaller and float8larger probably should behave
> likewise.  (That actually is the same as what you want for MIN(), but
> not for MAX() ...)

The spec seems to say that min/max should work the same way as the
comparison operators by saying that it returns the maximum or minimum
value as determined by the comparison rules of the comparison predicate
section.  That'd seem to be asking for the second version.



Re: min() and NaN

От
Stephan Szabo
Дата:
On Tue, 15 Jul 2003, Michael S. Tibbetts wrote:

> Hi,
>
> I have a table containing a double precision column. That column
> contains at least one judiciously placed NaN.
>
> I'd expect the aggregate function min() to return the minimum, valid
> numeric value.  Instead, it seems to return the minimum value from the
> subset of rows following the 'NaN'.

This appears to be a bug in the min (and presumably max) aggregate
function for floats.  AFAICT It should use the same logic as the < (or >)
comparison operator for the type (SQL92 6.5 GR2b iii I believe), but it's
just doing a < (or >) in C on the two arguments which isn't the same.

Changing it to follow those rules would give 2.718 for min and it looks
like NaN for max.



Re: min() and NaN

От
Jean-Luc Lachance
Дата:
If a compare with NaN is always false, how about rewriting it as:
result = ((arg1 < arg2) ? arg2 : arg1).

Or better yet, swap arg1 and arg2 when calling float8smaller.
Use flaost8smaller( current_min, value).

JLL

Tom Lane wrote:
> 
> "Michael S. Tibbetts" <mtibbetts@head-cfa.cfa.harvard.edu> writes:
> > I'd expect the aggregate function min() to return the minimum, valid
> > numeric value.  Instead, it seems to return the minimum value from the
> > subset of rows following the 'NaN'.
> 
> Not real surprising given than min() is implemented with float8smaller,
> which does this:
> 
>         result = ((arg1 > arg2) ? arg1 : arg2);
> 
> In most C implementations, any comparison involving a NaN will return
> "false".  So when we hit the NaN, we have arg1 = min so far, arg2 = NaN,
> comparison yields false, result is NaN.  On the next row, we have
> arg1 = NaN, arg2 = next value, comparison yields false, result is next
> value; and away it goes.
> 
> We could probably make it work the way you want with explicit tests for
> NaN in float8smaller, arranged to make sure that the result is not NaN
> unless both inputs are NaN.  But I'm not entirely convinced that we
> should make it work like that.  The other float8 comparison operators
> are designed to treat NaN as larger than every other float8 value (so
> that it has a well-defined position when sorting), and I'm inclined to
> think that float8smaller and float8larger probably should behave
> likewise.  (That actually is the same as what you want for MIN(), but
> not for MAX() ...)
> 
> Comments anyone?
> 
>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


Re: min() and NaN

От
Tom Lane
Дата:
Jean-Luc Lachance <jllachan@nsd.ca> writes:
> If a compare with NaN is always false, how about rewriting it as:
> result = ((arg1 < arg2) ? arg2 : arg1).

That just changes the failure mode.
        regards, tom lane


Re: min() and NaN

От
Bruce Momjian
Дата:
Is this a TODO?

---------------------------------------------------------------------------

Tom Lane wrote:
> Jean-Luc Lachance <jllachan@nsd.ca> writes:
> > If a compare with NaN is always false, how about rewriting it as:
> > result = ((arg1 < arg2) ? arg2 : arg1).
> 
> That just changes the failure mode.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: min() and NaN

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Is this a TODO?

It'll only take ten minutes to make it a DONE, once we figure out what
the behavior ought to be.  So far I think both Stephan and I argued that
MIN/MAX ought to treat NaN as larger than all ordinary values, for
consistency with the comparison operators.  That was not the behavior
Michael wanted, but I don't see that we have much choice given the
wording of the SQL spec.  Does anyone want to argue against that
definition?
        regards, tom lane


Re: min() and NaN

От
Bruce Momjian
Дата:
Well, my 2 cents is that though we consider NULL when ordering via ORDER
BY, we ignore it in MAX because it really isn't a value, and NaN seems
to be similar to NULL.

When doing ORDER BY, we have to put the NULL value somewhere, so we put
it at the end, but with aggregates, we aren't required to put the NULL
somewhere, so we ignore it.  Should that be the same for NaN?  I just
don't see how we can arbitrarly say it is greater/less than other
values.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Is this a TODO?
> 
> It'll only take ten minutes to make it a DONE, once we figure out what
> the behavior ought to be.  So far I think both Stephan and I argued that
> MIN/MAX ought to treat NaN as larger than all ordinary values, for
> consistency with the comparison operators.  That was not the behavior
> Michael wanted, but I don't see that we have much choice given the
> wording of the SQL spec.  Does anyone want to argue against that
> definition?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: min() and NaN

От
Stephan Szabo
Дата:
On Tue, 22 Jul 2003, Bruce Momjian wrote:

> Well, my 2 cents is that though we consider NULL when ordering via ORDER
> BY, we ignore it in MAX because it really isn't a value, and NaN seems
> to be similar to NULL.
>
> When doing ORDER BY, we have to put the NULL value somewhere, so we put
> it at the end, but with aggregates, we aren't required to put the NULL
> somewhere, so we ignore it.  Should that be the same for NaN?  I just
> don't see how we can arbitrarly say it is greater/less than other
> values.

But we already do. When doing a less than/greater than comparison, 'NaN'
is considered greater than normal values which is different from NULL
which returns unknown for both.



Re: min() and NaN

От
Jean-Luc Lachance
Дата:
Hey! here is a (stupid maybe) idea.  Why not disallow 'NaN' for a float?

JLL




Stephan Szabo wrote:
> 
> On Tue, 22 Jul 2003, Bruce Momjian wrote:
> 
> > Well, my 2 cents is that though we consider NULL when ordering via ORDER
> > BY, we ignore it in MAX because it really isn't a value, and NaN seems
> > to be similar to NULL.
> >
> > When doing ORDER BY, we have to put the NULL value somewhere, so we put
> > it at the end, but with aggregates, we aren't required to put the NULL
> > somewhere, so we ignore it.  Should that be the same for NaN?  I just
> > don't see how we can arbitrarly say it is greater/less than other
> > values.
> 
> But we already do. When doing a less than/greater than comparison, 'NaN'
> is considered greater than normal values which is different from NULL
> which returns unknown for both.


Re: min() and NaN

От
Bruce Momjian
Дата:
Stephan Szabo wrote:
> 
> On Tue, 22 Jul 2003, Bruce Momjian wrote:
> 
> > Well, my 2 cents is that though we consider NULL when ordering via ORDER
> > BY, we ignore it in MAX because it really isn't a value, and NaN seems
> > to be similar to NULL.
> >
> > When doing ORDER BY, we have to put the NULL value somewhere, so we put
> > it at the end, but with aggregates, we aren't required to put the NULL
> > somewhere, so we ignore it.  Should that be the same for NaN?  I just
> > don't see how we can arbitrarly say it is greater/less than other
> > values.
> 
> But we already do. When doing a less than/greater than comparison, 'NaN'
> is considered greater than normal values which is different from NULL
> which returns unknown for both.

But maybe that logic is the same as ORDER BY, where we have to give it
some location in sorting order, while with aggregates we don't.

I am not strong on this, but just point it out.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: min() and NaN

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, my 2 cents is that though we consider NULL when ordering via ORDER
> BY, we ignore it in MAX because it really isn't a value, and NaN seems
> to be similar to NULL.

Good idea, but I don't think we can get away with it.  The spec says
that MAX/MIN have to be consistent with the comparison operators (and
therefore with ORDER BY):
           iii) If MAX or MIN is specified, then the result is respec-                tively the maximum or minimum
valuein TXA. These results                are determined using the comparison rules specified in
Subclause8.2, "<comparison predicate>".
 

NULL can be special, because it acts specially in comparisons anyway.
But NaN is just a value of the datatype.

I'd be willing to go against the spec if I thought that having
ignore-NaNs behavior was sufficiently important, but I don't think it's
important enough to disregard the spec...
        regards, tom lane


Re: min() and NaN

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Well, my 2 cents is that though we consider NULL when ordering via ORDER
> > BY, we ignore it in MAX because it really isn't a value, and NaN seems
> > to be similar to NULL.
> 
> Good idea, but I don't think we can get away with it.  The spec says
> that MAX/MIN have to be consistent with the comparison operators (and
> therefore with ORDER BY):
> 
>             iii) If MAX or MIN is specified, then the result is respec-
>                  tively the maximum or minimum value in TXA. These results
>                  are determined using the comparison rules specified in
>                  Subclause 8.2, "<comparison predicate>".
> 
> NULL can be special, because it acts specially in comparisons anyway.
> But NaN is just a value of the datatype.
> 
> I'd be willing to go against the spec if I thought that having
> ignore-NaNs behavior was sufficiently important, but I don't think it's
> important enough to disregard the spec...

Yep.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: min() and NaN

От
Michael Tibbetts
Дата:
Treating NaN's as larger(or smaller) than all ordinary values seems a fine way 
to go.

It avoids the situation where you request MIN and get an ordinary value which 
is greater than the minimum ordinary value in the table.  If MIN(or MAX given 
the ordering you're suggesting) returns NaN, the user would stand better odds 
of figuring out that something about the query needs to be changed.

Returning an plausible, though possibly incorrect, ordinary value from MIN or 
MAX if there are NaN's in the column can lead users to make some unfortunate 
mistakes(voice of experience?).

Thanks for your help,
Mike


> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Is this a TODO?
> 
> It'll only take ten minutes to make it a DONE, once we figure out what
> the behavior ought to be.  So far I think both Stephan and I argued that
> MIN/MAX ought to treat NaN as larger than all ordinary values, for
> consistency with the comparison operators.  That was not the behavior
> Michael wanted, but I don't see that we have much choice given the
> wording of the SQL spec.  Does anyone want to argue against that
> definition?
> 
>             regards, tom lane






Re: min() and NaN

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> Does postgres intend to support all the different types of NaN? Does you
> intend to have +Inf and -Inf and underflow detection and all the other goodies
> you actually need to make it useful?

We have some of that; it needs work, and it's always going to be
dependent on the platform having proper IEEE support, but that's no
excuse to throw it away.
        regards, tom lane


Re: min() and NaN

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> NULL can be special, because it acts specially in comparisons anyway.
> But NaN is just a value of the datatype.

Does postgres intend to support all the different types of NaN? Does you
intend to have +Inf and -Inf and underflow detection and all the other goodies
you actually need to make it useful?

If not it seems more useful to just use the handy unknown-value thing SQL
already has and turn NaN into a NULL. 

-- 
greg