Обсуждение: regproc's lack of certainty is dangerous
Deepak Bhole of Red Hat just pointed out to me a failure that he got after some 7.3 stress testing: > [ "`~!@#$%^&*()''| \final_db\n,.;:'" ]=# SELECT n.nspname, p.proname, > o.oprcode::oid FROM pg_operator o, pg_proc p, pg_namespace n WHERE > o.oid=270447::oid AND p.oid=o.oprcode::oid AND p.pronamespace=n.oid; > ERROR: There is more than one procedure named "[ ""`~!@#$%^&*()''| > \final_schema\n,.;:'"" ]"."[ ""`~!@#$%^&*''| \ {func_for_op}\n,.;:'"" ]" This error comes out of regprocin() when it finds multiple candidate functions with the same name (and, presumably, different argument lists or different schemas). No big surprise, since that's what he had. But it's a bit odd that regprocin() is being invoked, when there's no regproc literal in the given query. After some digging, it turns out that the error is appearing because that function name is present in the pg_statistic entry for pg_operator.oprcode. *Any* query that causes the optimizer to become interested in pg_operator.oprcode will fail under these circumstances :-(. And the user can't readily avoid this, since there's no way to be sure which function names will happen to end up in the histogram. "Never ANALYZE the table" isn't going to fly as a workaround. I am not real sure what we should do about it. Clearly there is more risk than I'd realized in datatypes whose input routines may reject strings that their output routines had produced in good faith. One possible route is to try to eliminate the ambiguity, but I doubt that that will work very effectively for regproc and friends --- the whole point of those types is to resolve ambiguous input, and so the possibility of failures in the input routine can't easily be removed. regproc in particular needs its special behavior to be useful for bootstrapping. Another approach is to try to fix pg_statistic to avoid the problem by not doing I/O conversions. For scalar datatypes (those that have associated array types) it'd be probably be feasible to store the histogram and most-common-value arrays as arrays of the datatype itself, not arrays of text; that should be a win for performance as well as avoiding risky conversions. I am not sure what to do about columns that have datatypes without matching array types, though (that would include array columns and domains, IIRC). Maybe use array of bytea to hold the internal representation of the type? Any comments or better ideas out there? regards, tom lane
Tom Lane wrote: > Another approach is to try to fix pg_statistic to avoid the problem by > not doing I/O conversions. For scalar datatypes (those that have > associated array types) it'd be probably be feasible to store the > histogram and most-common-value arrays as arrays of the datatype itself, > not arrays of text; that should be a win for performance as well as > avoiding risky conversions. I am not sure what to do about columns that > have datatypes without matching array types, though (that would include > array columns and domains, IIRC). Maybe use array of bytea to hold the > internal representation of the type? ISTM that the best (if not the only feasible) approach is using array of bytea to hold the internal representation of the type. Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> I am not sure what to do about columns that >> have datatypes without matching array types, though (that would include >> array columns and domains, IIRC). Maybe use array of bytea to hold the >> internal representation of the type? > ISTM that the best (if not the only feasible) approach is using array of > bytea to hold the internal representation of the type. I'd like "select * from pg_statistic" to still produce readable output whenever possible, though. The bytea approach falls down badly on that score, so I don't want to resort to it except where I absolutely must. I think that we can actually get away (from an implementation point of view) with a column containing arrays of different base types; array_out will still work AFAIR. It's an interesting question though how such a column could reasonably be declared. This ties into your recent investigations into polymorphic array functions, perhaps. Maybe "anyarray" shouldn't be quite so pseudo a pseudotype? regards, tom lane
Tom Lane wrote: > I think that we can actually get away (from an implementation point of > view) with a column containing arrays of different base types; array_out > will still work AFAIR. It's an interesting question though how such a > column could reasonably be declared. This ties into your recent > investigations into polymorphic array functions, perhaps. > > Maybe "anyarray" shouldn't be quite so pseudo a pseudotype? I was having similar thoughts when you first posted this, but I wasn't sure you'd want to go there. I wonder what changes are required other than promoting the typtype from a 'p' to a 'b' and the I/O functions to array_out/array_in? Joe
Tom Lane wrote: > I think that we can actually get away (from an implementation point of > view) with a column containing arrays of different base types; array_out > will still work AFAIR. It's an interesting question though how such a > column could reasonably be declared. This ties into your recent > investigations into polymorphic array functions, perhaps. > > Maybe "anyarray" shouldn't be quite so pseudo a pseudotype? > More on this idea; here is a simple experiment: regression=# update pg_type set typtype = 'b', typinput = 'array_in', typoutput = 'array_out' where oid = 2277; UPDATE 1 regression=# create table bar(f1 int, f2 anyarray); CREATE TABLE regression=# insert into bar values (1,'{1,2}'::integer[]); INSERT 744428 1 regression=# insert into bar values (2,'{a,b}'::text[]); INSERT 744429 1 regression=# select * from bar; f1 | f2 ----+------- 1 | {1,2} 2 | {a,b} (2 rows) Interesting ... but then there is: regression=# select f1, f2[2] from bar; ERROR: transformArraySubscripts: type anyarray is not an array A bit more to do I guess. Joe
I wrote: >> I think that we can actually get away (from an implementation point of >> view) with a column containing arrays of different base types; array_out >> will still work AFAIR. It's an interesting question though how such a >> column could reasonably be declared. This ties into your recent >> investigations into polymorphic array functions, perhaps. >> >> Maybe "anyarray" shouldn't be quite so pseudo a pseudotype? I have committed a fix for these problems that makes pg_statistic's columns into "anyarray" columns. It turns out that my original concerns about datatypes without associated array types don't matter --- we can physically build such an array, regardless of whether we can point to a pg_type entry that describes it. This is certainly something of a kluge at the moment, because pg_statistic is making use of facilities that don't exist at the SQL level. It gets the job done, but I'd like to see some fuller support for "anyarray" in future. regards, tom lane