Обсуждение: Something seems weird inside tts_virtual_copyslot()
Looking at the Assert inside tts_virtual_copyslot(), it does: Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts); So, that seems to indicate that it's ok for the src slot to have fewer attributes than the destination. The code then calls tts_virtual_clear(dstslot), then slot_getallattrs(srcslot); then does the following loop: for (int natt = 0; natt < srcdesc->natts; natt++) { dstslot->tts_values[natt] = srcslot->tts_values[natt]; dstslot->tts_isnull[natt] = srcslot->tts_isnull[natt]; } Seems ok so far. If the srcslot has fewer attributes then that'll leave the extra dstslot array elements untouched. Where it gets weird is inside tts_virtual_materialize(). In that function, we materialize *all* of the dstslot attributes, even the extra ones that were left alone in the for loop shown above. Why do we need to materialize all of those attributes? We only need to materialize up to srcslot->natts. Per the following code, only up to the srcdesc->natts would be accessible anyway: dstslot->tts_nvalid = srcdesc->natts; Virtual slots don't need any further deforming and tts_virtual_getsomeattrs() is coded in a way that we'll find out if anything tries to deform a virtual slot. I changed the Assert in tts_virtual_copyslot() to check the natts match in each of the slots and all of the regression tests still pass, so it seems we have no tests where there's an attribute number mismatch... I wondered if there are any other cases that try to handle mismatching attribute numbers. On a quick scan of git grep -E "^\s*Assert\(.*natts.*\);" I don't see any other Asserts that allow mismatching attribute numbers. I think if we are going to support copying slots where the source and destination don't have the same number of attributes then the following comment should explain what's allowed and what's not allowed: /* * Copy the contents of the source slot into the destination slot's own * context. Invoked using callback of the destination slot. */ void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot); I also tried adding the following to ExecCopySlot() to see if there is any other slot copying going on with other slot types where the natts don't match. All tests pass still. Assert(srcslot->tts_tupleDescriptor->natts == dstslot->tts_tupleDescriptor->natts); Is the Assert() in tts_virtual_copyslot() wrong? David
Hi, On 2023-11-01 11:35:50 +1300, David Rowley wrote: > Looking at the Assert inside tts_virtual_copyslot(), it does: > > Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts); I think that assert was intended to be the other way round. > So, that seems to indicate that it's ok for the src slot to have fewer > attributes than the destination. The code then calls > tts_virtual_clear(dstslot), then slot_getallattrs(srcslot); then does > the following loop: > for (int natt = 0; natt < srcdesc->natts; natt++) > { > dstslot->tts_values[natt] = srcslot->tts_values[natt]; > dstslot->tts_isnull[natt] = srcslot->tts_isnull[natt]; > } > > Seems ok so far. > > If the srcslot has fewer attributes then that'll leave the extra dstslot > array elements untouched. It is not ok even up to just here! Any access to dstslot->tts_{values,isnull} for an attribute bigger than srcdesc->natts would now be stale, potentially pointing to another attribute. > Where it gets weird is inside tts_virtual_materialize(). In that > function, we materialize *all* of the dstslot attributes, even the > extra ones that were left alone in the for loop shown above. Why do > we need to materialize all of those attributes? We only need to > materialize up to srcslot->natts. > > Per the following code, only up to the srcdesc->natts would be > accessible anyway: > > dstslot->tts_nvalid = srcdesc->natts; > > Virtual slots don't need any further deforming and > tts_virtual_getsomeattrs() is coded in a way that we'll find out if > anything tries to deform a virtual slot. > > I changed the Assert in tts_virtual_copyslot() to check the natts > match in each of the slots and all of the regression tests still pass, > so it seems we have no tests where there's an attribute number > mismatch... If we want to prohibit that, I think we ought to assert this in ExecCopySlot(), rather than just tts_virtual_copyslot. Even that does survive the test - but I don't think it'd be really wrong to copy from a slot with more columns into one with fewer. And it seems plausible that that could happen somewhere, e.g. when copying from a slot in a child partition with additional columns into a slot from the parent, where the column types/order otherwise matches, so we don't have to use the attribute mapping infrastructure. > I think if we are going to support copying slots where the source and > destination don't have the same number of attributes then the > following comment should explain what's allowed and what's not > allowed: > > /* > * Copy the contents of the source slot into the destination slot's own > * context. Invoked using callback of the destination slot. > */ > void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot); Arguably the more relevant place would be document this in ExecCopySlot(), as that's what "users" of ExecCopySlot() would presumably would look at. I dug a bit in the history, and we used to say The caller must ensure the slots have compatible tupdescs. whatever that precisely means. > Is the Assert() in tts_virtual_copyslot() wrong? Yes, it's inverted. Greetings, Andres Freund
On Sat, 4 Nov 2023 at 15:15, Andres Freund <andres@anarazel.de> wrote: > > On 2023-11-01 11:35:50 +1300, David Rowley wrote: > > I changed the Assert in tts_virtual_copyslot() to check the natts > > match in each of the slots and all of the regression tests still pass, > > so it seems we have no tests where there's an attribute number > > mismatch... > > If we want to prohibit that, I think we ought to assert this in > ExecCopySlot(), rather than just tts_virtual_copyslot. > > Even that does survive the test - but I don't think it'd be really wrong to > copy from a slot with more columns into one with fewer. And it seems plausible > that that could happen somewhere, e.g. when copying from a slot in a child > partition with additional columns into a slot from the parent, where the > column types/order otherwise matches, so we don't have to use the attribute > mapping infrastructure. Do you have any examples of when this could happen? I played around with partitioned tables and partitions with various combinations of dropped columns and can't see any cases of this. Given the assert's condition has been backwards for 5 years now (4da597edf1b), it just seems a bit unlikely that we have cases where the source slot can have more attributes than the destination. Given the Assert has been that way around for this long without any complaints, I think we should just insist that the natts must match in each slot. If we later discover some reason that there's some corner case where they don't match, we can adjust the code then. I played around with the attached patch which removes the Assert and puts some additional Assert checks inside ExecCopySlot() which additionally checks the attribute types also match. There are valid cases where they don't match and that seems to be limited to cases where we're performing DML on a table with a dropped column. expand_insert_targetlist() will add NULL::int4 constants to the targetlist in place of dropped columns but the tupledesc of the table will have the atttypid set to InvalidOid per what that gets set to when a column is dropped in RemoveAttributeById(). > > I think if we are going to support copying slots where the source and > > destination don't have the same number of attributes then the > > following comment should explain what's allowed and what's not > > allowed: > > > > /* > > * Copy the contents of the source slot into the destination slot's own > > * context. Invoked using callback of the destination slot. > > */ > > void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot); > > Arguably the more relevant place would be document this in ExecCopySlot(), as > that's what "users" of ExecCopySlot() would presumably would look at. I dug a > bit in the history, and we used to say I think it depends on what you're documenting. Writing comments above the copyslot API function declaration is useful to define the API standard for what new implementations of that interface must abide by. Comments in ExecCopySlot() are useful to users of that function. It seems to me that both locations are relevant. New implementations of copyslot need to know what they must handle, else they're left just to look at what other implementations do and guess the rest. David
Вложения
Hi, On 2023-11-06 11:16:26 +1300, David Rowley wrote: > On Sat, 4 Nov 2023 at 15:15, Andres Freund <andres@anarazel.de> wrote: > > > > On 2023-11-01 11:35:50 +1300, David Rowley wrote: > > > I changed the Assert in tts_virtual_copyslot() to check the natts > > > match in each of the slots and all of the regression tests still pass, > > > so it seems we have no tests where there's an attribute number > > > mismatch... > > > > If we want to prohibit that, I think we ought to assert this in > > ExecCopySlot(), rather than just tts_virtual_copyslot. > > > > Even that does survive the test - but I don't think it'd be really wrong to > > copy from a slot with more columns into one with fewer. And it seems plausible > > that that could happen somewhere, e.g. when copying from a slot in a child > > partition with additional columns into a slot from the parent, where the > > column types/order otherwise matches, so we don't have to use the attribute > > mapping infrastructure. > > Do you have any examples of when this could happen? > I played around with partitioned tables and partitions with various > combinations of dropped columns and can't see any cases of this. Given > the assert's condition has been backwards for 5 years now > (4da597edf1b), it just seems a bit unlikely that we have cases where > the source slot can have more attributes than the destination. I think my concerns might be unfounded - I was worried about stuff like attribute mapping deciding that it's safe to copy without an attribute mapping, because all the types match. But it looks like we do check that the attnums match as well. There's similar code in a bunch of other places, e.g. ExecEvalWholeRowVar(), but that also verifies ->natts matches. So I think adding an assert to ExecCopySlot(), perhaps with a comment saying that the restriction could be lifted with a bit of work, would be fine. Greetings, Andres Freund
On Fri, 1 Dec 2023 at 13:14, Andres Freund <andres@anarazel.de> wrote: > So I think adding an assert to ExecCopySlot(), perhaps with a comment saying > that the restriction could be lifted with a bit of work, would be fine. Thanks for looking at this again. How about the attached? I wrote the comment you mentioned and also removed the Assert from tts_virtual_copyslot(). I also noted in the copyslot callback declaration that implementers can assume the number of attributes in the source and destination slots match. David
Вложения
On Fri, 1 Dec 2023 at 14:30, David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 1 Dec 2023 at 13:14, Andres Freund <andres@anarazel.de> wrote: > > So I think adding an assert to ExecCopySlot(), perhaps with a comment saying > > that the restriction could be lifted with a bit of work, would be fine. > > How about the attached? I wrote the comment you mentioned and also > removed the Assert from tts_virtual_copyslot(). I looked over this again and didn't see any issues, so pushed the patch. Thanks for helping with this. David