Обсуждение: Problems on NUMERIC
Hi, sometimes it's good not to spend too much efford implementing the final solution first. So for the NUMERIC. First I wonder why the can_coerce... stuff is #if'd out of parse_relation.c? For the NUMERIC type the numeric(num,typmod) must be called if someone does an INSERT INTO ... SELECT * FROM ... But it isn't. It is only called when there are calculations done on the columns. I also checked that for BPCHAR type and it simply throws an ERROR if the target's length doesn't match. This might be easy to fix, but the other problem I have is a bit more difficult. When binary operators (add, sub, mul, div) are called, the required precision of the result isn't known. And the coerce function numeric(num,typmod) will only be called for the final result. Now have the following situation: CREATE TABLE t1 (id int4, annual_val numeric(20,4)); CREATE TABLE t2 (id int4, monthly_val numeric(24,8)); INSERT INTO t2 SELECT id, annual_val / '12' FROM t1; A multiplication would have a maximum number of digits that can appear after the decimal point. It is the sum of number of digits present in the two operators. But not so for a division. If we want to implement NUMERIC with a real high precision (maybe 4000 or more digits), there would currently be no other chance than to do the division with the full ever possible precision and then throw away most of the digits when the result is assigned to the target column. Wasted efford and more important MUCH WASTED CPU. I can think of something like this: On add/subtract the results precision after the decimal point is the higher of the two operands. On multiply the results precision after the decimal point is the sum of the precisions of the two operands. On divide the results precision after the decimal point is like for mult or the double of the higher precision of the two operands. Any other suggestions? On the other hand it is possible to do it as INSERT INTO t2 SELECT id, ROUND(annual_val,8) / '12' FROM t1; How do other databases handle this problem. How is the precision of a numeric result defined? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> First I wonder why the can_coerce... stuff is #if'd out of > parse_relation.c? Oh! That looks like my style of #if FALSE, but I can't recall why it is that way. Will look at it. Does it work to just substitute an #if TRUE? Perhaps I had it disabled during debugging, but... > How do other databases handle this problem. How is the > precision of a numeric result defined? I've enclosed some snippets from my SQL92 2nd Draft Standard doc. It gives you a lot of latitude :) - Tom Syntax Rules 1) If the data type of both operands of a dyadic arithmetic opera- tor is exact numeric, then the data type of the resultis exact numeric, with precision and scale determined as follows: a) Let S1 and S2 be the scale of the first and second operands respectively. b) The precision of the result of addition and subtraction is implementation-defined, and the scale is the maximum ofS1 and S2. c) The precision of the result of multiplication is implementation- defined, and the scale is S1 + S2. d) The precision and scale of the result of division is implementation-defined. <snip large amounts> Whenever an exact or approximate numeric value is assigned to a data item or parameter representing an exact numeric value,an approximation of its value that preserves leading significant dig- its after rounding or truncating is representedin the data type of the target. The value is converted to have the precision and scale of the target. The choiceof whether to truncate or round is implementation-defined. An approximation obtained by truncation of a numerical value N for an <exact numeric type> T is a value V representablein T such that N is not closer to zero than the numerical value of V and such that the absolute value of thedifference between N and the numer- ical value of V is less than the absolute value of the difference between two successivenumerical values representable in T. An approximation obtained by rounding of a numerical value N for an <exact numeric type> T is a value V representable inT such that the absolute value of the difference between N and the nu- merical value of V is not greater than half theabsolute value of the difference between two successive numerical values repre- sentable in T. If there are more thanone such values V, then it is implementation-defined which one is taken. All numerical values between the smallest and the largest value, inclusive, representable in a given exact numeric typehave an approximation obtained by rounding or truncation for that type; it is implementation-defined which other numericalvalues have such approximations.
> > First I wonder why the can_coerce... stuff is #if'd out of > > parse_relation.c? It looks like the routine where the code appears cannot return a modified expression, so I just placed this code in there as a marker. Should be possible to add some code to get this feature working. - Tom
> > > First I wonder why the can_coerce... stuff is #if'd out of > > parse_relation.c? > > Oh! That looks like my style of #if FALSE, but I can't recall why it is > that way. Will look at it. Does it work to just substitute an #if TRUE? > Perhaps I had it disabled during debugging, but... Hmmm - elog(ERROR, "Type %s(%d) can be coerced to... looks like debugging code for me. Maybe you wanted to elog(DEBUG... ? > > > How do other databases handle this problem. How is the > > precision of a numeric result defined? > > I've enclosed some snippets from my SQL92 2nd Draft Standard doc. It > gives you a lot of latitude :) > Thanks! That helps alot! > > d) The precision and scale of the result of division is > implementation-defined. I love those definitions :-) So I'll make the display scale of a division min( max(S1, S2), SLIMIT) and the internal result scale min( RMINIMUM, max(R1, R2) + 2, RLIMIT) where S1 and S2 are the display scales of the two operands, R1 and R2 are the internal present scales and SLIMIT, RLIMIT is the implementation-defined maximum allowed scale at all (what about 4000 for SLIMIT ?). The RMINIMUM is 8 to have anything at least computed internal with 8 digits after the decimal point (because the defaults for NUMERIC are precision 30 scale 6). If then the result is assigned to another tuples attribute, numeric(num,typmod) will be called and do the rounding with the scale defined in typmod. If numeric_out(num) is called for it, it will be output rounded to the above display scale. With 'round(att1, 500) / att2' someone can then get the result with 500 digits scale. This way it is flexible enough but not to much wasted computing is done. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> First I wonder why the can_coerce... stuff is #if'd out of > parse_relation.c? For the NUMERIC type the > numeric(num,typmod) must be called if someone does an > > INSERT INTO ... SELECT * FROM ... > > But it isn't. It is only called when there are calculations > done on the columns. I also checked that for BPCHAR type and > it simply throws an ERROR if the target's length doesn't > match. Sorry, I'm having trouble thinking of a case which does not behave properly with the existing types. I've tried inserting varchar(10) columns into a varchar(1) column, I've tried inserting int columns into float columns, etc etc. How are you getting handleTargetColname() / checkTargetTypes() called where it is rejecting things? It may be that splitting that attribute field into two pieces for NUMERIC is opening a can of worms, since there are specific assumptions about what that field means throughout the code :( Maybe we should think about how to isolate the type-specific interpretation of that attribute field into a type-specific handler routine? Ooh, that sounds like a pain... - Tom
> > > First I wonder why the can_coerce... stuff is #if'd out of > > parse_relation.c? For the NUMERIC type the > > numeric(num,typmod) must be called if someone does an > > > > INSERT INTO ... SELECT * FROM ... > > > > But it isn't. It is only called when there are calculations > > done on the columns. I also checked that for BPCHAR type and > > it simply throws an ERROR if the target's length doesn't > > match. > > Sorry, I'm having trouble thinking of a case which does not behave > properly with the existing types. I've tried inserting varchar(10) > columns into a varchar(1) column, I've tried inserting int columns into > float columns, etc etc. How are you getting handleTargetColname() / > checkTargetTypes() called where it is rejecting things? pgsql=> create table t1 (a char(10)); CREATE pgsql=> create table t2 (a char(4)); CREATE pgsql=> insert into t2 select * from t1; ERROR: Length of a is not equal to the length of target column a pgsql=> > > It may be that splitting that attribute field into two pieces for > NUMERIC is opening a can of worms, since there are specific assumptions > about what that field means throughout the code :( It doesn't produce any problems so far, only that the function numeric(num,typmod) isn't called when doing a plain INSERT ... SELECT. It is only called when comparisions where performed in the SELECT clause of the INSERT on the numeric attributes. But I need that call to force the rounding and range check at INSERT time. Otherwise, the values in the target table will output later with the scale of their original source table, and that's wrong. Also it would be possible to insert 1000.0 into a numeric(5,2) attribute, and that shouldn't be. Maybe I have to hook for NUMERIC there in parse_relation too. Up to now I'm compiling the whole thing as loadable module. I'll check it that's possible when moving it to the builtins. But in general I think if there is a function with the same name as a type, that take this type plus another int4 argument, this must be a range checker/padder/truncator or the like and it should be called before values are assigned to attributes. > > Maybe we should think about how to isolate the type-specific > interpretation of that attribute field into a type-specific handler > routine? Ooh, that sounds like a pain... Noooooooooo Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> > How are you getting handleTargetColname() / > > checkTargetTypes() called where it is rejecting things? OK, I'm not sure why the behavior is different if I explicitly specify the columns (which of course I had for testing): postgres=> insert into t2 select a from t1; INSERT 0 0 postgres=> insert into t2 select * from t1; ERROR: Length of 'a' is not equal to the length of target column 'a' > It doesn't produce any problems so far, only that the > function numeric(num,typmod) isn't called when doing a plain > INSERT ... SELECT. Hmm. Even when you explicitly specify the columns as I did in my example above? I should be able to get the wildcard example to work sometime before v6.5, and I *think* that the explicit cases should do what you want. As a loadable module, your data type will only match itself for type coersion, but that's what you want for now. When it is built in, then you will be able to specify that it is higher or lower in a heirarchy with, for example, int4 and float8. > Maybe I have to hook for NUMERIC there in parse_relation too. > Up to now I'm compiling the whole thing as loadable module. > I'll check it that's possible when moving it to the builtins. Unless you can't find a test case which does work for you, don't bother looking at it; I'll pick it up some time soon. > But in general I think if there is a function with the same > name as a type, that take this type plus another int4 > argument, this must be a range checker/padder/truncator or > the like and it should be called before values are assigned > to attributes. That's how it should work afaik, at least for variable-length types. Not all types are checked for this conversion function... - Tom