Обсуждение: Reduce NUMERIC size by 2 bytes, reduce max length to 508 digits

Поиск
Список
Период
Сортировка

Reduce NUMERIC size by 2 bytes, reduce max length to 508 digits

От
Bruce Momjian
Дата:
[ Moved to hackers for patch discussion.]

John D. Burger wrote:
> >> There are practical applications, eg, 1024-bit keys are fairly common
> >> objects in cryptography these days, and that equates to about 10^308.
> >> I don't really foresee anyone trying to run crypto algorithms with SQL
> >> NUMERIC arithmetic, though ...
> >
> > 2046 bit keys are becoming more common. However, math using these keys 
> > is
> > usually done modulo a product of two primes and there are ways of 
> > doing the
> > calculations that are going to be much faster than doing them the way
> > Postgres does. So it is unlikely that anyone would be using Postgres' 
> > numeric
> > type to do this in any case.
> 
> Nonetheless, the fact that people can think of practical applications 
> for numbers whose length is easily within a factor of two of the 
> proposed limitation makes me squeamish about it being shrunk.  Also, I 
> would say the same arguments about doing math with NUMERICs suggest 
> that saving a few byes in representation is not a big deal.  On the few 
> occasions where I have used NUMERICs, I didn't care about stuff like 
> that.
> 
> For what it's worth.

Good point, but I am not 100% sure on the limitation.  Look at this:
test=> CREATE TABLE test(x NUMERIC);CREATE TABLEtest=> INSERT INTO test SELECT pow(10::numeric, 10000) + 1;INSERT 0
1test=>SELECT log(x) FROM test;          log------------------------ 10000.0000000000000000(1 row)test=> SELECT x % 10
FROMtest;      ?column?-------------------- 1.0000000000000000(1 row)
 

And this seems to work too:
test=> INSERT INTO test SELECT pow(10::numeric, 120000) + 1;INSERT 0 1

The limit seems to be around 150k digits:
test=> INSERT INTO test SELECT pow(10::numeric, 150000) + 1;ERROR:  value overflows numeric format

With current code, you can not define a NUMERIC column with greater than
1000 digits because we just placed an arbitrary limit on the length, but
the computational length was obviously much larger than the storage
limit.  And I suppose you could exceed 1000 if you stored the result as
text and converted it to NUMERIC just for computations.

In fact we have this TODO, but I wonder if it is still an open issue:
* Change NUMERIC to enforce the maximum precision

We seem to enforce things just fine.
Now, with the new patch, I see a _much_ lower limit:
test=> SELECT pow(10::NUMERIC, 511) + 1;...(1 row)test=> SELECT pow(10::NUMERIC, 512) + 1;ERROR:  value overflows
numericformattest=> SELECT pow(10::NUMERIC, 512);ERROR:  value overflows numeric format
 

I thought maybe I could do the computations at least and then convert
into text, but seeing the above it seems higher precision computation is
just not possible --- it is more than just storage in a table that is
changed.

So, with the patch, the storage length is going from 1000 digits to 508,
but the computational length is reduced from around 150k digits to 508. 
Now, because no one has complained about the 1000-digit limit, it is
unlikely that anyone is doing calculations over 1000 or the would have
had problems with storing the value, but I felt I should point out that
we are dramatically changing the computational length.

In fact, for the tests we have been running to debug the *printf
problem, none of those queries will work with the patch:
stest=> SELECT factorial(4000);ERROR:  value overflows numeric formattest=> SELECT factorial(400);ERROR:  value
overflowsnumeric format
 

Not only does 4000! not work, but 400! doesn't even work.  I just lost
demo "wow" factor points!

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508 digits

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> The limit seems to be around 150k digits:

It's exactly 10^(128K), as I've mentioned more than once.

> So, with the patch, the storage length is going from 1000 digits to 508,
> but the computational length is reduced from around 150k digits to 508. 
> Now, because no one has complained about the 1000-digit limit, it is
> unlikely that anyone is doing calculations over 1000 or the would have
> had problems with storing the value,

Only if they declared their columns as numeric(N) and not just plain
unconstrained numeric.  Not to mention the possibility that they're
doing the same thing you just did, ie computing values and returning
them to the client without ever storing them in a table.  So I don't
think the above reasoning is defensible.

> Not only does 4000! not work, but 400! doesn't even work.  I just lost
> demo "wow" factor points!

It looks like the limit would be about factorial(256).

The question remains, though, is this computational range good for
anything except demos?
        regards, tom lane


Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508 digits

От
Gregory Maxwell
Дата:
On 12/5/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Not only does 4000! not work, but 400! doesn't even work.  I just lost
> > demo "wow" factor points!
>
> It looks like the limit would be about factorial(256).
>
> The question remains, though, is this computational range good for
> anything except demos?

I've hesitated commenting, because I think it might be a silly reason,
but perhaps it's one other people share.  ...  I use PG as a
calculator for big numbers because it's the only user friendly thing
on my system that can do factorial(300) - factorial(280). I'd rather
use something like octave, but I've found its pretty easy to escape
its range.   If the range for computation is changed, then I'll
probably keep an old copy around just for this, though I'm not quite
sure how much I'd be affected..


Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > The limit seems to be around 150k digits:
> 
> It's exactly 10^(128K), as I've mentioned more than once.
> 
> > So, with the patch, the storage length is going from 1000 digits to 508,
> > but the computational length is reduced from around 150k digits to 508. 
> > Now, because no one has complained about the 1000-digit limit, it is
> > unlikely that anyone is doing calculations over 1000 or the would have
> > had problems with storing the value,
> 
> Only if they declared their columns as numeric(N) and not just plain
> unconstrained numeric.  Not to mention the possibility that they're
> doing the same thing you just did, ie computing values and returning
> them to the client without ever storing them in a table.  So I don't
> think the above reasoning is defensible.
> 
> > Not only does 4000! not work, but 400! doesn't even work.  I just lost
> > demo "wow" factor points!
> 
> It looks like the limit would be about factorial(256).
> 
> The question remains, though, is this computational range good for
> anything except demos?

I can say that the extended range is good for finding *printf problems.  ;-)

Let me also add that as far as saving disk space, this is the _big_
improvement on the TODO list:
* Merge xmin/xmax/cmin/cmax back into three header fields  Before subtransactions, there used to be only three fields
neededto  store these four values. This was possible because only the current  transaction looks at the cmin/cmax
values.If the current transaction  created and expired the row the fields stored where xmin (same as  xmax), cmin,
cmax,and if the transaction was expiring a row from a  another transaction, the fields stored were xmin (cmin was not
needed),xmax, and cmax. Such a system worked because a transaction  could only see rows from another completed
transaction.However,  subtransactions can see rows from outer transactions, and once the  subtransaction completes, the
outertransaction continues, requiring  the storage of all four fields. With subtransactions, an outer  transaction can
createa row, a subtransaction expire it, and when the  subtransaction completes, the outer transaction still has to
have proper visibility of the row's cmin, for example, for cursors.  One possible solution is to create a phantom cid
whichrepresents a  cmin/cmax pair and is stored in local memory.  Another idea is to  store both cmin and cmax only in
localmemory.
 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508

От
Michael Fuhr
Дата:
On Mon, Dec 05, 2005 at 11:59:10PM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > It looks like the limit would be about factorial(256).
> > 
> > The question remains, though, is this computational range good for
> > anything except demos?
> 
> I can say that the extended range is good for finding *printf problems.  ;-)

Might anybody be calculating permutations or combinations with the
textbook functions that use factorials?  Not a show-stopper since
those calculations can be optimized (at least the basic formulas I
know), but somebody might get bit by the change.  Maybe the release
notes could mention the new upper limit of factorial().

-- 
Michael Fuhr


Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508

От
Tom Lane
Дата:
Michael Fuhr <mike@fuhr.org> writes:
> On Mon, Dec 05, 2005 at 11:59:10PM -0500, Bruce Momjian wrote:
>> Tom Lane wrote:
>>> The question remains, though, is this computational range good for
>>> anything except demos?
>> 
>> I can say that the extended range is good for finding *printf problems.  ;-)

> Might anybody be calculating permutations or combinations with the
> textbook functions that use factorials?

Hm ... between that, the possible crypto connection, and John's personal
testimony that he actually uses PG for calculations in this range, I'm
starting to lean to the idea that we shouldn't cut the range.

We could get the same 2-byte savings (in fact 3 bytes on average,
considering alignment issues) by implementing a 2-byte length word
format for numeric.  I had originally hoped to do both things to save
an average 5 bytes per numeric, which is starting to get to the point of
actually being interesting ;-).  But maybe we should just do the part
that we can do without removing any user-visible functionality.
        regards, tom lane


Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508

От
"John D. Burger"
Дата:
Tom Lane wrote:

> Hm ... between that, the possible crypto connection, and John's 
> personal
> testimony that he actually uses PG for calculations in this range, I'm
> starting to lean to the idea that we shouldn't cut the range.

Just to be clear, this John has yet to use NUMERIC for any 
calculations, let alone in that range.  (I've only used NUMERIC for 
importing real-valued data where I didn't want to lose precision with a 
floating point representation, for instance, decimal latitude-longitude 
values.)

There was this post, though:

Gregory Maxwell wrote:

> I've hesitated commenting, because I think it might be a silly reason,
> but perhaps it's one other people share.  ...  I use PG as a
> calculator for big numbers because it's the only user friendly thing
> on my system that can do factorial(300) - factorial(280). I'd rather
> use something like octave, but I've found its pretty easy to escape
> its range.   If the range for computation is changed, then I'll
> probably keep an old copy around just for this, though I'm not quite
> sure how much I'd be affected..

- John D. Burger  MITRE



Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508

От
Tom Lane
Дата:
"John D. Burger" <john@mitre.org> writes:
> Tom Lane wrote:
>> Hm ... between that, the possible crypto connection, and John's 
>> personal testimony

> Just to be clear, this John has yet to use NUMERIC for any 
> calculations, let alone in that range.

My mistake, got confused as to who had said what.

The point remains though: in discussing this proposed patch, we were
assuming that 10^508 would still be far beyond what people actually
needed.  Even one or two reports from the list membership of actual
use of larger values casts a pretty big shadow on that assumption.
        regards, tom lane


Re: Reduce NUMERIC size by 2 bytes, reduce max length to 508

От
Bruce Momjian
Дата:
Tom Lane wrote:
> "John D. Burger" <john@mitre.org> writes:
> > Tom Lane wrote:
> >> Hm ... between that, the possible crypto connection, and John's 
> >> personal testimony
> 
> > Just to be clear, this John has yet to use NUMERIC for any 
> > calculations, let alone in that range.
> 
> My mistake, got confused as to who had said what.
> 
> The point remains though: in discussing this proposed patch, we were
> assuming that 10^508 would still be far beyond what people actually
> needed.  Even one or two reports from the list membership of actual
> use of larger values casts a pretty big shadow on that assumption.

Agreed.  I would like to see us hit the big savings first, like merging
cmin/cmax (4 bytes per row) and reducing the varlena header size (2-3
bytes for short values), before we start going after disk savings that
actually limit our capabilites.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073