Обсуждение: Collation order for btree-indexable datatypes
To avoid getting into states where a btree index is corrupt (or appears that way), it is absolutely critical that the datatype provide a unique, consistent sort order. In particular, the operators = <> < <= > >= had better all agree with each other and with the 3-way-comparison support function about the ordering of any two non-NULL data values. After tracing some Assert failures in the new planner statistics code I'm working on, I have realized that several of our existing datatypes fail to meet this fundamental requirement, and therefore are prone to serious misbehavior when trying to index "weird" values. In particular, type NUMERIC does not return consistent results for comparisons involving "NaN" values, and several of the date/time types do not return consistent results for comparisons involving "INVALID" values. (Example: numeric_cmp will assert that two NaNs are equal, whereas numeric_eq will assert that they aren't. Worse, numeric_cmp will assert that a NaN is equal to any non-NaN, too. The date/time routines avoid the latter mistake but make the former one.) I am planning to fix this by ensuring that all these operations agree on an (arbitrarily chosen) sort order for the "weird" values of these types. What I'm wondering about is whether to insert the fixes into 7.1.1 or wait for 7.2. In theory changing the sort order might break existing user indexes, and should therefore be avoided until an initdb is needed. But: any indexes that contain these values are likely broken already, since in fact we don't have a well-defined sort order right now for these values. A closely related problem is that the "current time" special value supported by several of the date/time datatypes is inherently not compatible with being indexed, since its sort order relative to ordinary time values keeps changing. We had discussed removing this special case, and I think agreed to do so, but it hasn't happened yet. What I'm inclined to do is force consistency of the comparison operators now (for 7.1.1) and then remove "current time" for 7.2, but perhaps it'd be better to leave the whole can of worms alone until 7.2. Comments anyone? regards, tom lane
> I am planning to fix this by ensuring that all these operations agree > on an (arbitrarily chosen) sort order for the "weird" values of these > types. What I'm wondering about is whether to insert the fixes into > 7.1.1 or wait for 7.2. In theory changing the sort order might break > existing user indexes, and should therefore be avoided until an initdb > is needed. But: any indexes that contain these values are likely broken > already, since in fact we don't have a well-defined sort order right now > for these values. > What I'm inclined to do is force consistency of the comparison operators > now (for 7.1.1) and then remove "current time" for 7.2, but perhaps it'd > be better to leave the whole can of worms alone until 7.2. Comments > anyone? Assuming that the changes are reasonably safe, I think you're right. What parts of the changes would require an initdb, would new functions need to be added or the index ops need to change or would it be fixes to the existing functions (if the latter, wouldn't a recompile and dropping/recreating the indexes be enough?)
> A closely related problem is that the "current time" special value > supported by several of the date/time datatypes is inherently not > compatible with being indexed, since its sort order relative to > ordinary time values keeps changing. We had discussed removing this > special case, and I think agreed to do so, but it hasn't happened yet. > > What I'm inclined to do is force consistency of the comparison operators > now (for 7.1.1) and then remove "current time" for 7.2, but perhaps it'd > be better to leave the whole can of worms alone until 7.2. Comments > anyone? Comparing NaN/Invalid seems so off the beaten path that we would just wait for 7.2. That and no one has reported a problem with it so far. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Wed, 2 May 2001, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > What parts of the changes would require an initdb, would new > > functions need to be added or the index ops need to change or would > > it be fixes to the existing functions (if the latter, wouldn't a recompile > > and dropping/recreating the indexes be enough?) > > Yes, dropping and recreating any user indexes that contain the problematic > values would be sufficient to get you out of trouble. We don't need any > system catalog changes for this, AFAICS. Looking back, I misread the original message. I thought you were saying that it needed to wait for an initdb and so would be bad in a dot release, but it was just the breaking of indexes thing, but since they're already pretty broken, I don't see much of a loss by fixing it.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > What parts of the changes would require an initdb, would new > functions need to be added or the index ops need to change or would > it be fixes to the existing functions (if the latter, wouldn't a recompile > and dropping/recreating the indexes be enough?) Yes, dropping and recreating any user indexes that contain the problematic values would be sufficient to get you out of trouble. We don't need any system catalog changes for this, AFAICS. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Comparing NaN/Invalid seems so off the beaten path that we would just > wait for 7.2. That and no one has reported a problem with it so far. Do you consider "vacuum analyze" on the regression database to be off the beaten path? How about creating an index on a numeric column that contains NaNs, or a timestamp column that contains Invalid? Unless you believe these values are not being used in the field at all, there's a problem. (And if you do believe that, you shouldn't be worried about my changing their behavior ;-)) regards, tom lane
If you feel strongly about it, go ahead. I didn't see any problem reports on it, and it seemed kind of iffy, so I thought we should hold it. > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Comparing NaN/Invalid seems so off the beaten path that we would just > > wait for 7.2. That and no one has reported a problem with it so far. > > Do you consider "vacuum analyze" on the regression database to be > off the beaten path? How about creating an index on a numeric column > that contains NaNs, or a timestamp column that contains Invalid? > > Unless you believe these values are not being used in the field at all, > there's a problem. (And if you do believe that, you shouldn't be > worried about my changing their behavior ;-)) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026