Обсуждение: Do we need multiple forms of the SQL2003 statistics aggregates?
Sergey's recent patch for the SQL2003 binary aggregates implements each of them three times: in float4, float8, and numeric arithmetic. This seems like vast overkill. The float4 versions certainly ought to go, as they won't perform noticeably better than float8 and will likely be subject to terrible roundoff-error problems. I'm also dubious about the numeric versions. The spec says b) Let DTDVE be the declared type of DVE and let DTIVE be thedeclared type of IVE. c) Case: i) The declared type of REGR_COUNT is an implementation-definedexact numeric type with scale of 0 (zero). ii) Otherwise, the declared type of the result is animplementation-defined approximate numeric type. If DTDVE is anapproximatenumeric type, then the precision of the result isnot less than the precision of DTDVE. If DTIVE is an approximatenumerictype, then the precision of the result is not less thanthe precision of DTIVE. so as far as I can see we are *only* required to provide a float8 implementation. There is room to argue that the numeric-arithmetic version would be worth having on the grounds of greater precision or range, but it's a big chunk of code and the public demand for the functionality has not exactly been overwhelming. Comments? regards, tom lane
I wrote: > There is room to argue that the numeric-arithmetic version would be > worth having on the grounds of greater precision or range, but it's a > big chunk of code and the public demand for the functionality has not > exactly been overwhelming. > Comments? Since no one's even bothered to respond, I take it there's insufficient interest in the numeric versions of these aggregates. I've committed just the float8 versions. I added some very trivial regression tests, which we'll have to keep an eye on to see if they have any portability problems. We may need to back off the number of displayed fraction digits to get them to pass everywhere. If anyone wants to do better tests, feel free... regards, tom lane
Tom, thank you for the reviewing/correcting/applying my patches... On Fri, 28 Jul 2006, Tom Lane wrote: > I wrote: >> There is room to argue that the numeric-arithmetic version would be >> worth having on the grounds of greater precision or range, but it's a >> big chunk of code and the public demand for the functionality has not >> exactly been overwhelming. > >> Comments? > > Since no one's even bothered to respond, I take it there's insufficient > interest in the numeric versions of these aggregates. I've committed > just the float8 versions. My opinion on that is 1) I agree that really the float4 version are useless, because they don't offer anything new... comparing to the float8 2) But I think the numeric versions of these functions are not useless... (if somebody have numeric columns... ) I don't think the 10-15 additional functions in pg_proc is too much. Concerning to the amount of code in numeric.c I think it is large, but not complicated at all, so I don't think that it will be a problem to support that code. I think since we are supporting the numeric type as a special high-precision type, Postgres must have the high-precision versions of all computational functions. Just my opinion. > > I added some very trivial regression tests, which we'll have to keep an > eye on to see if they have any portability problems. We may need to > back off the number of displayed fraction digits to get them to pass > everywhere. > > If anyone wants to do better tests, feel free... I will try to write some better tests and send a patch. Regards, Sergey ******************************************************************* Sergey E. Koposov Max Planck Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru
Sergey E. Koposov wrote: > I think since we are supporting the numeric type as a special > high-precision type, Postgres must have the high-precision > versions of all computational functions. Just my opinion. Another way to look at it is whether you want to have accurate computations (numeric) or approximate computations (float). I'm not a statistician, so I don't know what most of these functions are used for. From a mathematician's point of view, however, some of these functions normally produce irrational numbers anyway, so it seems unlikely that numeric will be useful. But looking at the definition of, say, regr_avgx(Y, X), if all the input values are integers, it might be useful if I could get an exact integer or rational number as output, instead of a float, that is. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Sat, 29 Jul 2006, Peter Eisentraut wrote: > Sergey E. Koposov wrote: >> I think since we are supporting the numeric type as a special >> high-precision type, Postgres must have the high-precision >> versions of all computational functions. Just my opinion. > > Another way to look at it is whether you want to have accurate > computations (numeric) or approximate computations (float). I'm not a > statistician, so I don't know what most of these functions are used > for. From a mathematician's point of view, however, some of these > functions normally produce irrational numbers anyway, so it seems > unlikely that numeric will be useful. But looking at the definition > of, say, regr_avgx(Y, X), if all the input values are integers, it > might be useful if I could get an exact integer or rational number as > output, instead of a float, that is. Exactly from the statistical point of view, there is no need to have the integer output of those 2-arg. aggregates. For example corr(), regr_*() are by definition not integer (they just don't have any sense as integers...)( -1<= corr(Y,X)<=1 ) (for example the stddev(int) do not produce int also, because it does not have any sense) So it's perfectly fine that they are producing only floating numbers... Regards, Sergey ******************************************************************* Sergey E. Koposov Max Planck Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru
Peter Eisentraut <peter_e@gmx.net> writes: > ... From a mathematician's point of view, however, some of these > functions normally produce irrational numbers anyway, so it seems > unlikely that numeric will be useful. But looking at the definition > of, say, regr_avgx(Y, X), if all the input values are integers, it > might be useful if I could get an exact integer or rational number as > output, instead of a float, that is. The question is whether this is useful enough to justify adding a thousand lines to numeric.c. I believe also that the numeric forms of the aggregates will be enormously slower than the float forms, at least on most modern machines with decent floating-point performance. I don't have time to do any performance testing though. regards, tom lane
On Sat, 29 Jul 2006, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> ... From a mathematician's point of view, however, some of these >> functions normally produce irrational numbers anyway, so it seems >> unlikely that numeric will be useful. But looking at the definition >> of, say, regr_avgx(Y, X), if all the input values are integers, it >> might be useful if I could get an exact integer or rational number as >> output, instead of a float, that is. > > The question is whether this is useful enough to justify adding a > thousand lines to numeric.c. > > I believe also that the numeric forms of the aggregates will be > enormously slower than the float forms, at least on most modern > machines with decent floating-point performance. I don't have time > to do any performance testing though. > Based on just simple tests On my laptop (1.8Ghz Centrino) the numeric versions are 5-10 times slower (depending on what 2arg function is used) The example: template1=# \d test Table "public.test" Column | Type | Modifiers --------+------------------+----------- na | numeric | nb | numeric | da | double precision| db | double precision | template1=# select count(*) from test; count --------- 1000000 (1 row) template1=# select regr_avgx(da,db) from test; regr_avgx ----------------- 0.5002412120227 (1 row) Time: 1052,893 ms template1=# select regr_avgx(na,nb) from test; regr_avgx ------------------------- 0.500040167263887822939 (1 row) Time: 4459,739 ms template1=# select regr_sxx(da,db) from test; regr_sxx ------------------ 83303.6317359119 (1 row) Time: 1043,891 ms template1=# select regr_sxx(na,nb) from test; regr_sxx -------------------------------------------------- 83342.044294954140912267902323775495680113567986 (1 row) Time: 8514,843 ms template1=# select corr(da,db) from test; corr ---------------------- 0.000527588261283456 (1 row) Time: 1074,948 ms template1=# select corr(na,nb) from test; corr ---------------------------------------------------------------------------------------- 0.000759857150984988517883855238363403977440313567465424735082001422354119457463407737 (1 row) Time: 18327,376 ms Yes, that's significantly slower, but I don't think that it make the functions unusable... Regards, Sergey ******************************************************************* Sergey E. Koposov Max Planck Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru