Обсуждение: When do we lose column names?
Consider the following query: select (x).key as k, (x).value as v from (select each(hstore(q)) as x from (select oid, c.* from pg_class c where oid = 'parent'::regclass)q) t; This gives results like this: k | v -----+--------f1 | 16410f2 | parentf3 | 2200f4 | 16412f5 | 0f6 | 10 .... Now add a limit clause to the innermost query: select (x).key as k, (x).value as v from (select each(hstore(q)) as x from (select oid, c.* from pg_class c where oid = 'parent'::regclass limit 99999999) q) t; Now the result looks like this: k | v ----------------+--------oid | 16410relam | 0relacl |relkind | rrelname | parentreltype | 16412 What I'm having difficulty understanding is why the limit clause should make any difference. Is this a bug? If not, is it documented. cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > What I'm having difficulty understanding is why the limit clause should > make any difference. Without the LIMIT, the query gets flattened to something like this: Index Scan using pg_class_oid_index on pg_catalog.pg_class c (cost=0.00..8.27 rows=1 width=202) Output: ROW(c.oid, c.relname, c.relnamespace, c.reltype, c.reloftype, c.relowner, c.relam,c.relfilenode, c.reltablespace, c.relpages, c.reltuples, c.relallvisible, c.reltoastrelid, c.reltoastidxid, c.relhasindex,c.relisshared, c.relpersistence, c.relkind, c.relnatts, c.relchecks, c.relhasoids, c.relhaspkey, c.relhasrules,c.relhastriggers, c.relhassubclass, c.relfrozenxid, c.relacl, c.reloptions) Index Cond: (c.oid = 53532::oid) and the issue seems to be that in execution of a RowExpr, the executor doesn't pay any attention to preserving the column names in the generated tupledesc --- see the ExecTypeFromExprList call in execQual.c. We could certainly make it do that --- it wouldn't even be terribly hard, since RowExpr already does store the column names. The only downside I can see is that this might lead to more transient rowtypes being kept around in a backend, since RowExprs with distinct field names would now lead to different "blessed" rowtypes. But that doesn't seem like a big deal. It was just never apparent before that we should care about field names in a tupledesc at execution time. I'm disinclined to consider this a back-patchable bug fix; it seems possible that somebody out there is depending on the current behavior. But we could think about changing it in HEAD. (wanders off to look at whether the only other caller of ExecTypeFromExprList could be taught to provide useful field names...) regards, tom lane
I wrote: > and the issue seems to be that in execution of a RowExpr, the > executor doesn't pay any attention to preserving the column names > in the generated tupledesc --- see the ExecTypeFromExprList call > in execQual.c. ... > We could certainly make it do that --- it wouldn't even be terribly > hard, since RowExpr already does store the column names. ... > (wanders off to look at whether the only other caller of > ExecTypeFromExprList could be taught to provide useful field names...) PFC, a patch that does this. This seems to fix Andrew's issue with respect to the RowExpr case. It's not really ideal with respect to the ValuesScan case, because what you get seems to always be the hard-wired "columnN" names for VALUES columns, even if you try to override that with an alias: regression=# select each(hstore(q)) as x from (values (1,2,3),(4,5,6) limit 2) as q(x,y,z); x ------------- (column1,1) (column2,2) (column3,3) (column1,4) (column2,5) (column3,6) (6 rows) I think this happens because VALUES in a FROM item is treated like a sub-select, and the aliases are getting applied at the "wrong" level. Don't know if that's worth trying to fix, or how hard it would be. Curiously, it works just fine if the VALUES can be folded: regression=# select each(hstore(q)) as x from (values (1,2,3),(4,5,6)) as q(x,y,z); x ------- (x,1) (y,2) (z,3) (x,4) (y,5) (z,6) (6 rows) regards, tom lane diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index 887e5ce82a0b9011627450fdd1046d5529beb110..8f0b5979ba589a1e3aa10405671d4c0793361c5e 100644 *** a/src/backend/executor/execQual.c --- b/src/backend/executor/execQual.c *************** ExecInitExpr(Expr *node, PlanState *pare *** 4627,4633 **** if (rowexpr->row_typeid == RECORDOID) { /* generic record, use runtime type assignment */ ! rstate->tupdesc = ExecTypeFromExprList(rowexpr->args); BlessTupleDesc(rstate->tupdesc); /* we won't need to redo this at runtime */ } --- 4627,4634 ---- if (rowexpr->row_typeid == RECORDOID) { /* generic record, use runtime type assignment */ ! rstate->tupdesc = ExecTypeFromExprList(rowexpr->args, ! rowexpr->colnames); BlessTupleDesc(rstate->tupdesc); /* we won't need to redo this at runtime */ } diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c index 3f44ef0186ace8434ceaaf2ef37c14ca0fec632d..3ed90da7a52e2d0e982cfe0ef29022ac1ce1a4a5 100644 *** a/src/backend/executor/execTuples.c --- b/src/backend/executor/execTuples.c *************** ExecTypeFromTLInternal(List *targetList, *** 954,980 **** /* * ExecTypeFromExprList - build a tuple descriptor from a list of Exprs * ! * Here we must make up an arbitrary set of field names. */ TupleDesc ! ExecTypeFromExprList(List *exprList) { TupleDesc typeInfo; ! ListCell *l; int cur_resno = 1; ! char fldname[NAMEDATALEN]; typeInfo = CreateTemplateTupleDesc(list_length(exprList), false); ! foreach(l, exprList) { ! Node *e = lfirst(l); ! ! sprintf(fldname, "f%d", cur_resno); TupleDescInitEntry(typeInfo, cur_resno, ! fldname, exprType(e), exprTypmod(e), 0); --- 954,981 ---- /* * ExecTypeFromExprList - build a tuple descriptor from a list of Exprs * ! * Caller must also supply a list of field names (String nodes). */ TupleDesc ! ExecTypeFromExprList(List *exprList, List *namesList) { TupleDesc typeInfo; ! ListCell *le; ! ListCell *ln; int cur_resno = 1; ! ! Assert(list_length(exprList) == list_length(namesList)); typeInfo = CreateTemplateTupleDesc(list_length(exprList), false); ! forboth(le, exprList, ln, namesList) { ! Node *e = lfirst(le); ! char *n = strVal(lfirst(ln)); TupleDescInitEntry(typeInfo, cur_resno, ! n, exprType(e), exprTypmod(e), 0); diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c index d5260e40b32abe5dc5ff437cda8e037f6cdbe12a..0089e501fa2e43c10a1d530110d1fa5f38144cd9 100644 *** a/src/backend/executor/nodeValuesscan.c --- b/src/backend/executor/nodeValuesscan.c *************** *** 25,30 **** --- 25,31 ---- #include "executor/executor.h" #include "executor/nodeValuesscan.h" + #include "parser/parsetree.h" static TupleTableSlot *ValuesNext(ValuesScanState *node); *************** ValuesScanState * *** 188,193 **** --- 189,196 ---- ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags) { ValuesScanState *scanstate; + RangeTblEntry *rte = rt_fetch(node->scan.scanrelid, + estate->es_range_table); TupleDesc tupdesc; ListCell *vtl; int i; *************** ExecInitValuesScan(ValuesScan *node, ESt *** 239,245 **** /* * get info about values list */ ! tupdesc = ExecTypeFromExprList((List *) linitial(node->values_lists)); ExecAssignScanType(&scanstate->ss, tupdesc); --- 242,249 ---- /* * get info about values list */ ! tupdesc = ExecTypeFromExprList((List *) linitial(node->values_lists), ! rte->eref->colnames); ExecAssignScanType(&scanstate->ss, tupdesc); diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index bdd499bea662d405fc6522c036c53955c0a2e3f9..0d69a34e12bd224396019c37b20cdcac85d73339 100644 *** a/src/include/executor/executor.h --- b/src/include/executor/executor.h *************** extern TupleTableSlot *ExecInitNullTuple *** 256,262 **** TupleDesc tupType); extern TupleDesc ExecTypeFromTL(List *targetList, bool hasoid); extern TupleDesc ExecCleanTypeFromTL(List *targetList, bool hasoid); ! extern TupleDesc ExecTypeFromExprList(List *exprList); extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg); typedef struct TupOutputState --- 256,262 ---- TupleDesc tupType); extern TupleDesc ExecTypeFromTL(List *targetList, bool hasoid); extern TupleDesc ExecCleanTypeFromTL(List *targetList, bool hasoid); ! extern TupleDesc ExecTypeFromExprList(List *exprList, List *namesList); extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg); typedef struct TupOutputState
I wrote: > PFC, a patch that does this. Upon further review, this patch would need some more work even for the RowExpr case, because there are several places that build RowExprs without bothering to build a valid colnames list. It's clearly soluble if anyone cares to put in the work, but I'm not personally excited enough to pursue it ... regards, tom lane
On 11/16/2011 10:38 PM, Tom Lane wrote: > I wrote: >> PFC, a patch that does this. > Upon further review, this patch would need some more work even for the > RowExpr case, because there are several places that build RowExprs > without bothering to build a valid colnames list. It's clearly soluble > if anyone cares to put in the work, but I'm not personally excited > enough to pursue it ... The patch itself causes a core dump with the current regression tests. This test: SELECT array_to_json(array_agg(q),false) FROM ( SELECT $$a$$ || x AS b, y AS c, ARRAY[ROW(x.*,ARRAY[1,2,3]), ROW(y.*,ARRAY[4,5,6])] AS z FROM generate_series(1,2) x, generate_series(4,5) y) q; causes a failure at line 967 of execTuples.c: Assert(list_length(exprList) == list_length(namesList)); I'm investigating further. I've been looking at the other places that build RowExprs. Most of them look OK and none seem clearly in need of fixing at first glance. Which do you think need attention? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 11/16/2011 10:38 PM, Tom Lane wrote: >> Upon further review, this patch would need some more work even for the >> RowExpr case, because there are several places that build RowExprs >> without bothering to build a valid colnames list. It's clearly soluble >> if anyone cares to put in the work, but I'm not personally excited >> enough to pursue it ... > The patch itself causes a core dump with the current regression tests. Yeah, observing that was what made me write the above. > I've been looking at the other places that build RowExprs. Most of them > look OK and none seem clearly in need of fixing at first glance. Which > do you think need attention? In general I think we'd have to require that colnames be supplied in all RowExprs if we go this way. Anyplace that's trying to slide by without will have to be fixed. I don't recall how many places that is. regards, tom lane
On 02/07/2012 12:47 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 11/16/2011 10:38 PM, Tom Lane wrote: >>> Upon further review, this patch would need some more work even for the >>> RowExpr case, because there are several places that build RowExprs >>> without bothering to build a valid colnames list. It's clearly soluble >>> if anyone cares to put in the work, but I'm not personally excited >>> enough to pursue it ... >> The patch itself causes a core dump with the current regression tests. > Yeah, observing that was what made me write the above. > >> I've been looking at the other places that build RowExprs. Most of them >> look OK and none seem clearly in need of fixing at first glance. Which >> do you think need attention? > In general I think we'd have to require that colnames be supplied in all > RowExprs if we go this way. Anyplace that's trying to slide by without > will have to be fixed. I don't recall how many places that is. I just had a thought that maybe we could make this simpler by dummying up a list of colnames if we don't have one, instead of that assertion. Or am I on the wrong track. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 02/07/2012 12:47 PM, Tom Lane wrote: >> In general I think we'd have to require that colnames be supplied in all >> RowExprs if we go this way. Anyplace that's trying to slide by without >> will have to be fixed. I don't recall how many places that is. > I just had a thought that maybe we could make this simpler by dummying > up a list of colnames if we don't have one, instead of that assertion. > Or am I on the wrong track. Well, if there are more than one or two RowExpr creators for which a dummy set of colnames is the best we can do anyway, that might be a reasonable answer. But I think it would encourage people to be lazy and let the dummy colnames be used even when they can do better, so I'd rather not take this as a first-choice solution. regards, tom lane
On 02/07/2012 03:03 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 02/07/2012 12:47 PM, Tom Lane wrote: >>> In general I think we'd have to require that colnames be supplied in all >>> RowExprs if we go this way. Anyplace that's trying to slide by without >>> will have to be fixed. I don't recall how many places that is. >> I just had a thought that maybe we could make this simpler by dummying >> up a list of colnames if we don't have one, instead of that assertion. >> Or am I on the wrong track. > Well, if there are more than one or two RowExpr creators for which a > dummy set of colnames is the best we can do anyway, that might be a > reasonable answer. But I think it would encourage people to be lazy > and let the dummy colnames be used even when they can do better, so > I'd rather not take this as a first-choice solution. > > OK, the one place that needs to be fixed to avoid the crash caused by the json regression tests with the original patch is in src/backend/parser/parse_expr.c:transformRowExpr(). Other candidates I have found that don't set colnames and should probably use dummy names are: * src/backend/parser/gram.y (row: production) * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator()* src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator() Given a function: extern List *makeDummyColnames(List * args); what's the best place to put it? I couldn't see a terribly obvious place in the source. cheers andrew
Excerpts from Andrew Dunstan's message of jue feb 09 16:38:27 -0300 2012: > OK, the one place that needs to be fixed to avoid the crash caused by > the json regression tests with the original patch is in > > src/backend/parser/parse_expr.c:transformRowExpr(). > > Other candidates I have found that don't set colnames and should > probably use dummy names are: > > * src/backend/parser/gram.y (row: production) > * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator() > * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator() > > > Given a function: > > extern List *makeDummyColnames(List * args); > > > what's the best place to put it? I couldn't see a terribly obvious place > in the source. parse_relation.c? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Andrew Dunstan <andrew@dunslane.net> writes: > OK, the one place that needs to be fixed to avoid the crash caused by > the json regression tests with the original patch is in > src/backend/parser/parse_expr.c:transformRowExpr(). > Other candidates I have found that don't set colnames and should > probably use dummy names are: > * src/backend/parser/gram.y (row: production) > * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator() > * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator() Hm, can't the last two get real column names from somewhere? Also, why would it be the responsibility of the grammar production to fill in the list, rather than transformRowExpr? > Given a function: > extern List *makeDummyColnames(List * args); > what's the best place to put it? I couldn't see a terribly obvious place > in the source. If there are enough potential users to justify having such a global function at all, then we might as well not bother but just let execQual fill in dummy names on the fly. Providing such a function means that there is nothing whatever pushing writers of new RowExpr constructors to not be lazy --- the path of least resistance will be to call makeDummyColnames. I was hoping that there would be few enough places where no information is available that we'd not need to have a global function like that. regards, tom lane
On 02/09/2012 02:54 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> OK, the one place that needs to be fixed to avoid the crash caused by >> the json regression tests with the original patch is in >> src/backend/parser/parse_expr.c:transformRowExpr(). >> Other candidates I have found that don't set colnames and should >> probably use dummy names are: >> * src/backend/parser/gram.y (row: production) >> * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator() >> * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator() > Hm, can't the last two get real column names from somewhere? Possibly. I'll dig a bit deeper. > Also, why > would it be the responsibility of the grammar production to fill in the > list, rather than transformRowExpr? > > Sure. I'll just comment the source accordingly. cheers andrew
On 02/09/2012 03:06 PM, Andrew Dunstan wrote: > > > On 02/09/2012 02:54 PM, Tom Lane wrote: >> Andrew Dunstan<andrew@dunslane.net> writes: >>> OK, the one place that needs to be fixed to avoid the crash caused by >>> the json regression tests with the original patch is in >>> src/backend/parser/parse_expr.c:transformRowExpr(). >>> Other candidates I have found that don't set colnames and should >>> probably use dummy names are: >>> * src/backend/parser/gram.y (row: production) >>> * >>> src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator() >>> * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator() >> Hm, can't the last two get real column names from somewhere? > > Possibly. I'll dig a bit deeper. > I've had a look at these two. It's at least not obvious to me how to do this simply, if at all. In the last case it looks like we'd need to process the object recursively just like we do to extract the field values, and I don't know where to get them in the appendrel case at all, possibly because I'm not very familiar with this code. Do we actually need to bother with these cases? The regression tests pass without touching them, either because they don't matter or because we don't have a test for these cases that would tickle the assertion that was failing. If they don't matter, would it not be better just to note that in the code rather than building a list of field names for no good purpose? cheers andrew
On Sat, Feb 11, 2012 at 11:11 AM, Andrew Dunstan <adunstan@postgresql.org> wrote: >>>> Other candidates I have found that don't set colnames and should >>>> probably use dummy names are: >>>> * src/backend/parser/gram.y (row: production) >>>> * >>>> src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator() >>>> * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator() >>> >>> Hm, can't the last two get real column names from somewhere? >> >> Possibly. I'll dig a bit deeper. > > I've had a look at these two. It's at least not obvious to me how to do this > simply, if at all. In the last case it looks like we'd need to process the > object recursively just like we do to extract the field values, and I don't > know where to get them in the appendrel case at all, possibly because I'm > not very familiar with this code. > > Do we actually need to bother with these cases? The regression tests pass > without touching them, either because they don't matter or because we don't > have a test for these cases that would tickle the assertion that was > failing. If they don't matter, would it not be better just to note that in > the code rather than building a list of field names for no good purpose? In flatten_join_alias_vars_mutator(), we've got a RangeTblEntry to work with. I think the column names are to be found in the alias and/or eref attributes of the RangeTblEntry. Each of those is an Alias object, which is defined like this: typedef struct Alias { NodeTag type; char *aliasname; /* aliased rel name (never qualified) */ List *colnames; /* optional list of column aliases */ } Alias; I'm not sure whether we should look at rte->eref.colnames, rte->alias.colnames, or both. In adjust_appendrel_attrs_mutator(), we have a list, translated_vars, whose order matches the column order of the parent rel. If we had the parent's RangeTblEntry, we could probably precede as in the previous case. But the AppendRelInfo only contains the index of the RT. Maybe we can figure out a way to use rt_fetch to get the RangeTblEntry itself, but that requires a pointer to the range table itself, which we haven't got. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Feb 11, 2012 at 11:11 AM, Andrew Dunstan > <adunstan@postgresql.org> wrote: >> Do we actually need to bother with these cases? > In flatten_join_alias_vars_mutator(), we've got a RangeTblEntry to > work with. I think the column names are to be found in the alias > and/or eref attributes of the RangeTblEntry. The eref names are the ones to use. alias just records the original AS clause (if any) attached to the RTE, which is mostly useful only for reverse-listing the query. > In adjust_appendrel_attrs_mutator(), we have a list, translated_vars, > whose order matches the column order of the parent rel. If we had the > parent's RangeTblEntry, we could probably precede as in the previous > case. But the AppendRelInfo only contains the index of the RT. Maybe > we can figure out a way to use rt_fetch to get the RangeTblEntry > itself, but that requires a pointer to the range table itself, which > we haven't got. This is surely fixable by passing a bit more information down. If you (Andrew) have something that covers everything but this issue, pass it over and I'll take a whack at it. regards, tom lane
On 02/13/2012 11:00 AM, Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> On Sat, Feb 11, 2012 at 11:11 AM, Andrew Dunstan >> <adunstan@postgresql.org> wrote: >>> Do we actually need to bother with these cases? >> In flatten_join_alias_vars_mutator(), we've got a RangeTblEntry to >> work with. I think the column names are to be found in the alias >> and/or eref attributes of the RangeTblEntry. > The eref names are the ones to use. alias just records the original AS > clause (if any) attached to the RTE, which is mostly useful only for > reverse-listing the query. > >> In adjust_appendrel_attrs_mutator(), we have a list, translated_vars, >> whose order matches the column order of the parent rel. If we had the >> parent's RangeTblEntry, we could probably precede as in the previous >> case. But the AppendRelInfo only contains the index of the RT. Maybe >> we can figure out a way to use rt_fetch to get the RangeTblEntry >> itself, but that requires a pointer to the range table itself, which >> we haven't got. > This is surely fixable by passing a bit more information down. If you > (Andrew) have something that covers everything but this issue, pass it > over and I'll take a whack at it. What I have fixes one of the three cases I have identified that appear to need fixing, the one that caused the json tests to crash with your original partial patch. See <https://bitbucket.org/adunstan/pgdevel/changesets/tip/rowexprs>. I won't have time to return to this for a few days. The two remaining cases should be fairly independent I think. If you don't get to this before then I'll look at the flatten_join_alias_vars_mutator case again and try to fix it based on the above, and then give you what I've got. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 02/13/2012 11:00 AM, Tom Lane wrote: >> This is surely fixable by passing a bit more information down. If you >> (Andrew) have something that covers everything but this issue, pass it >> over and I'll take a whack at it. > What I have fixes one of the three cases I have identified that appear > to need fixing, the one that caused the json tests to crash with your > original partial patch. See > <https://bitbucket.org/adunstan/pgdevel/changesets/tip/rowexprs>. I > won't have time to return to this for a few days. The two remaining > cases should be fairly independent I think. If you don't get to this > before then I'll look at the flatten_join_alias_vars_mutator case again > and try to fix it based on the above, and then give you what I've got. OK, I fixed this up and committed it. I made some cosmetic changes (the most notable being that the definition of RowExpr is really changing here, and so should its comment). The adjust_appendrel_attrs situation was fixed by passing in the PlannerInfo, which is something that probably should have been made available all along --- there are very few nontrivial functions in the planner that don't need it. I'm still a bit annoyed by the behavior I mentioned here, http://archives.postgresql.org/pgsql-hackers/2011-11/msg01031.php that we don't get "real" column names from an unflattened VALUES RTE. Might be worth looking into that, but I don't have time for it. regards, tom lane
On 02/14/2012 05:39 PM, Tom Lane wrote: > OK, I fixed this up and committed it. I made some cosmetic changes > (the most notable being that the definition of RowExpr is really > changing here, and so should its comment). The adjust_appendrel_attrs > situation was fixed by passing in the PlannerInfo, which is something > that probably should have been made available all along --- there are > very few nontrivial functions in the planner that don't need it. Great, many thanks for finishing this up. > > I'm still a bit annoyed by the behavior I mentioned here, > http://archives.postgresql.org/pgsql-hackers/2011-11/msg01031.php > that we don't get "real" column names from an unflattened VALUES RTE. > Might be worth looking into that, but I don't have time for it. > > A TODO maybe? cheers andrew