Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Дата
Msg-id b8f458d6-f0d5-3b0b-1f53-a51aec8aa15b@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-bugs
Hello,

28.06.2023 21:49, Tom Lane wrote:
> The bad news is that while investigating this I came across
> another hazard that seems much messier to fix.  To wit, if
> you have a tuple with "missing" pass-by-ref columns, then
> some of the columns output by heap_deform_tuple() will
> contain pointers into the tupdesc (cf. getmissingattr()).
> That's not sufficient lifespan in the scenario we're dealing
> with here: if the tupdesc gets trashed anytime before the
> eventual ExecEvalFieldStoreForm(), we'll have garbage in the
> result tuple.

Please excuse me, as I'm definitely late to the party, but may I ask some
questions to understand the issue discussed? I still can't see a practical
way to get garbage data in a tuple with missing columns, so maybe I'm
missing something (except for columns).

If we talk about ExecEvalFieldStoreDeForm() -> heap_deform_tuple() ->
getmissingattr(), then I can't find a way to add a non-null default for a
record type to exploit this path. The rowtypes test contains:
-- at the moment this will not work due to ALTER TABLE inadequacy:
alter table fullname add column suffix text default '';
ERROR:  cannot alter table "fullname" because column "people.fn" uses its row type

So it seems to me, that ExecEvalFieldStoreDeForm() can't be called for a
record/tuple having missing non-null attributes. And even if ALTER TABLE was
"adequate", does it mean that ExecEvalFieldStoreDeForm() would return default
values from the underlying table type?

> Worse, I fear there may be many other places with latent bugs of the
> same ilk.  Before the attmissingval code was added, one could assume
> that the result of heap_deform_tuple would hold good as long as you
> had a pin on the source tuple.  But now it depends on metadata as
> well, and I don't think we have a coherent story about that.

But is it true, that having a pin on tupdesc is enough to make sure that
the result of heap_deform_tuple() holds good?

If so, then probably we should find places, where tupdesc gets unpinned
before that result is saved somewhere.
I see the following callers of the heap_deform_tuple():
src/backend/replication/logical/reorderbuffer.c
     ReorderBufferToastReplace(): heap_deform_tuple() followed by heap_form_tuple()
src/backend/executor/execTuples.c
     ExecForceStoreHeapTuple(), ExecForceStoreMinimalTuple(), ExecStoreHeapTupleDatum(): use long-living and hopefully

pinned slot->tts_tupleDescriptor
src/backend/executor/execExprInterp.c
     ExecEvalFieldStoreDeForm(): the above question applies here
src/backend/executor/spi.c
     SPI_modifytuple(): heap_deform_tuple() followed by heap_form_tuple()
src/backend/utils/adt/rowtypes.c
     record_out(), hash_record(), hash_record_extended(): tupdesc released after extracting/processing data
     record_send(): tupdesc released after sending data
     record_cmp(), record_eq(), record_image_cmp(), record_image_eq(): tupdesc1. tupdesc2 released after comparing
data
src/backend/utils/adt/jsonfuncs.c
     populate_record(): heap_deform_tuple() followed by heap_form_tuple()
src/backend/utils/adt/expandedrecord.c
     deconstruct_expanded_record() uses long-living (pinned) erh->er_tupdesc
src/backend/utils/adt/ri_triggers.c
     RI_Initial_Check(), RI_PartitionRemove_Check(): use long-living? SPI_tuptable->tupdesc
src/backend/access/heap/heapam_handler.c
     reform_and_rewrite_tuple(): uses RelationGetDescr(OldHeap) (pinned, I suppose)
src/backend/access/heap/heaptoast.c
     heap_toast_delete(), heap_toast_insert_or_update() use long-living rel->rd_att;
     toast_flatten_tuple(): heap_deform_tuple(..., tupleDesc, ...) followed by heap_form_tuple(..., tupleDesc, ...)
     toast_flatten_tuple_to_datum(): heap_deform_tuple followed by detoasting attrs
src/backend/access/heap/heapam.c
     ExtractReplicaIdentity(): heap_deform_tuple(..., desc, ...) followed by heap_form_tuple(..., desc, ...)
src/backend/access/common/heaptuple.c
     heap_modify_tuple(): heap_deform_tuple(..., tupleDesc, ...) followed by heap_form_tuple(..., tupleDesc, ...)
     heap_modify_tuple_by_cols(): heap_deform_tuple(..., tupleDesc, ...) followed by heap_form_tuple(..., tupleDesc,
...)
src/backend/access/common/tupconvert.c
     execute_attr_map_tuple(): heap_deform_tuple() followed by heap_form_tuple()
src/test/regress/regress.c
     make_tuple_indirect(): heap_deform_tuple() followed by heap_form_tuple()
src/pl/plpgsql/src/pl_exec.c
     exec_move_row() called with tupdesc = NULL, SPI_tuptable->tupdesc, or tupdesc released after the call
contrib/hstore/hstore_io.c
     hstore_from_record(), hstore_populate_record(): tupdesc released after extracting data

And even if I missed some possibly problematic calls, maybe it's worth to
consider fixing exactly that places...

Also, besides heap_deform_tuple(), getmissingattr() is called from
heap_getattr(). And heap_getattr() is used in many places, but most of them
are protected too due to tupdesc pinned.
Two suspicious places that I found are GetAttributeByName() and
GetAttributeByNum(), so when using these functions you can really get the
missing attribute value with the tupdesc released. But what circumstances do
we need to end up with an invalid data?
1) Write a function that will call GetAttributeByName() and hold it's
  result inside for some time.
2) Add a passed-by-ref column with a default value to a table.
3) Pass to that function a tuple without the "missing" column (just
  "SELECT func(table) FROM table" won't do, but func(OLD) in a trigger should).
4) Trigger a relcache invalidation.
5) Perform a database access inside that function to process the invalidation.
6) Access the result of GetAttributeByName() after that.

Maybe we could construct such test case, e.g. add to pg_regress one more
SQL-accessible C function, bu9ccvhhihvcnb·t I wonder, is this the way to go?

On the other hand, probably there are extensions, that use
GetAttributeByName() in the non-safe way (as the function documentation
doesn't warn about such issues), but maybe just perform datumCopy inside
that function (and similar one(s))?
Though, perhaps I just don't notice the elephant in the core...

Best regards,
Alexander



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18021: Loading Error