Обсуждение: Badly broken logic in plpython composite-type handling

Поиск
Список
Период
Сортировка

Badly broken logic in plpython composite-type handling

От
Tom Lane
Дата:
I looked into the crash reported in bug #13579.  The proximate cause
of the crash is that PLyString_ToComposite does this:
  PLy_output_datum_func2(&info->out.d, typeTup,                         exec_ctx->curr_proc->langid,
    exec_ctx->curr_proc->trftypes);
 

without any thought for the possibility that info->is_rowtype is 1,
in which case it's stomping on the info->out.r data that will be
needed later.  I have serious doubts that making PLyTypeOutput a
union was a wise idea, as it means that C offers you exactly zero
protection against this type of data clobber.

I tried changing that to
  PLy_output_datum_func(info, typeTup,                        exec_ctx->curr_proc->langid,
exec_ctx->curr_proc->trftypes);

which unsurprisingly results in "ERROR:  PLyTypeInfo struct is initialized
for a Tuple" in the case reported in the bug ... and also results in quite
a few of the plpython regression tests failing that way, which makes it a
bit astonishing that we've not tripped over this bug already.

I'm inclined to think that maybe PLyString_ToComposite needs to be doing
something more like what PLyObject_ToComposite does, ie doing its own
lookup in a private descriptor; but I'm not sure if that's right, and
anyway it would just be doubling down on a bad design.  Not being able
to cache these I/O function lookups is really horrid.

I wonder if that could be fixed by changing PLyTypeInput and PLyTypeOutput
from unions to structs and getting rid of is_rowtype in favor of allowing
the d and r fields to be valid or not independently; then you could treat
the object as either scalar or rowtype at need.

Does whoever designed this code in the first place want to fix it?
        regards, tom lane



Re: Badly broken logic in plpython composite-type handling

От
Tom Lane
Дата:
I wrote:
> I looked into the crash reported in bug #13579.  The proximate cause
> of the crash is that PLyString_ToComposite does this:
> ...
> I'm inclined to think that maybe PLyString_ToComposite needs to be doing
> something more like what PLyObject_ToComposite does, ie doing its own
> lookup in a private descriptor; but I'm not sure if that's right, and
> anyway it would just be doubling down on a bad design.  Not being able
> to cache these I/O function lookups is really horrid.

I wrote a draft patch that does it that way.  It indeed stops the crash,
and even better, makes PLyString_ToComposite actually work for RECORD,
which AFAICT it's never done before.  We'd conveniently omitted to test
the case in the plpython regression tests, thus masking the fact that
it would crash horribly if you invoked such a case more than once.

To make it work, I had to modify record_in to allow inputting a value
of type RECORD as long as it's given a correct typmod.  While that would
not happen in ordinary SQL, it's possible in this and probably other
usages, and I can't really see any downside.  The emitted record will
be properly marked with type RECORDOID and typmod whatever the internal
anonymous-type identifier is, and it's no more or less type-safe than
any other such value.

This version of PLyString_ToComposite is no better than
PLyObject_ToComposite as far as leaking memory goes.  We could probably
fix that in a crude fashion by using plpython's "scratch context" for the
transient type-lookup data, but I'd rather see a proper fix wherein the
lookup results stay cached.  In any case, that's a separate and less
critical matter.

Barring objections, I propose to commit and back-patch this.  The crash
can be demonstrated back to 9.1.

            regards, tom lane

diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index a65e18d..ec4f71e 100644
*** a/src/backend/utils/adt/rowtypes.c
--- b/src/backend/utils/adt/rowtypes.c
*************** record_in(PG_FUNCTION_ARGS)
*** 73,84 ****
  {
      char       *string = PG_GETARG_CSTRING(0);
      Oid            tupType = PG_GETARG_OID(1);
!
! #ifdef NOT_USED
!     int32        typmod = PG_GETARG_INT32(2);
! #endif
      HeapTupleHeader result;
-     int32        tupTypmod;
      TupleDesc    tupdesc;
      HeapTuple    tuple;
      RecordIOData *my_extra;
--- 73,80 ----
  {
      char       *string = PG_GETARG_CSTRING(0);
      Oid            tupType = PG_GETARG_OID(1);
!     int32        tupTypmod = PG_GETARG_INT32(2);
      HeapTupleHeader result;
      TupleDesc    tupdesc;
      HeapTuple    tuple;
      RecordIOData *my_extra;
*************** record_in(PG_FUNCTION_ARGS)
*** 91,106 ****
      StringInfoData buf;

      /*
!      * Use the passed type unless it's RECORD; we can't support input of
!      * anonymous types, mainly because there's no good way to figure out which
!      * anonymous type is wanted.  Note that for RECORD, what we'll probably
!      * actually get is RECORD's typelem, ie, zero.
       */
!     if (tupType == InvalidOid || tupType == RECORDOID)
          ereport(ERROR,
                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
             errmsg("input of anonymous composite types is not implemented")));
-     tupTypmod = -1;                /* for all non-anonymous types */

      /*
       * This comes from the composite type's pg_type.oid and stores system oids
--- 87,103 ----
      StringInfoData buf;

      /*
!      * Give a friendly error message if we did not get enough info to identify
!      * the target record type.  (lookup_rowtype_tupdesc would fail anyway, but
!      * with a non-user-facing message.)  Note that for RECORD, what we'll
!      * usually actually get is RECORD's typelem, ie, zero.  However there are
!      * cases where we do get a valid typmod and can do something useful.
       */
!     if (tupType == InvalidOid ||
!         (tupType == RECORDOID && tupTypmod < 0))
          ereport(ERROR,
                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
             errmsg("input of anonymous composite types is not implemented")));

      /*
       * This comes from the composite type's pg_type.oid and stores system oids
*************** record_recv(PG_FUNCTION_ARGS)
*** 449,460 ****
  {
      StringInfo    buf = (StringInfo) PG_GETARG_POINTER(0);
      Oid            tupType = PG_GETARG_OID(1);
!
! #ifdef NOT_USED
!     int32        typmod = PG_GETARG_INT32(2);
! #endif
      HeapTupleHeader result;
-     int32        tupTypmod;
      TupleDesc    tupdesc;
      HeapTuple    tuple;
      RecordIOData *my_extra;
--- 446,453 ----
  {
      StringInfo    buf = (StringInfo) PG_GETARG_POINTER(0);
      Oid            tupType = PG_GETARG_OID(1);
!     int32        tupTypmod = PG_GETARG_INT32(2);
      HeapTupleHeader result;
      TupleDesc    tupdesc;
      HeapTuple    tuple;
      RecordIOData *my_extra;
*************** record_recv(PG_FUNCTION_ARGS)
*** 466,481 ****
      bool       *nulls;

      /*
!      * Use the passed type unless it's RECORD; we can't support input of
!      * anonymous types, mainly because there's no good way to figure out which
!      * anonymous type is wanted.  Note that for RECORD, what we'll probably
!      * actually get is RECORD's typelem, ie, zero.
       */
!     if (tupType == InvalidOid || tupType == RECORDOID)
          ereport(ERROR,
                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
             errmsg("input of anonymous composite types is not implemented")));
!     tupTypmod = -1;                /* for all non-anonymous types */
      tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
      ncolumns = tupdesc->natts;

--- 459,476 ----
      bool       *nulls;

      /*
!      * Give a friendly error message if we did not get enough info to identify
!      * the target record type.  (lookup_rowtype_tupdesc would fail anyway, but
!      * with a non-user-facing message.)  Note that for RECORD, what we'll
!      * usually actually get is RECORD's typelem, ie, zero.  However there are
!      * cases where we do get a valid typmod and can do something useful.
       */
!     if (tupType == InvalidOid ||
!         (tupType == RECORDOID && tupTypmod < 0))
          ereport(ERROR,
                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
             errmsg("input of anonymous composite types is not implemented")));
!
      tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
      ncolumns = tupdesc->natts;

diff --git a/src/pl/plpython/expected/plpython_composite.out b/src/pl/plpython/expected/plpython_composite.out
index ad454f3..0ef0c21 100644
*** a/src/pl/plpython/expected/plpython_composite.out
--- b/src/pl/plpython/expected/plpython_composite.out
*************** elif typ == 'obj':
*** 65,70 ****
--- 65,72 ----
      type_record.first = first
      type_record.second = second
      return type_record
+ elif typ == 'str':
+     return "('%s',%r)" % (first, second)
  $$ LANGUAGE plpythonu;
  SELECT * FROM multiout_record_as('dict', 'foo', 1, 'f');
   first | second
*************** SELECT multiout_record_as('dict', 'foo',
*** 78,83 ****
--- 80,217 ----
   (foo,1)
  (1 row)

+ SELECT * FROM multiout_record_as('dict', null, null, false);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('dict', 'one', null, false);
+  first | second
+ -------+--------
+  one   |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('dict', null, 2, false);
+  first | second
+ -------+--------
+        |      2
+ (1 row)
+
+ SELECT * FROM multiout_record_as('dict', 'three', 3, false);
+  first | second
+ -------+--------
+  three |      3
+ (1 row)
+
+ SELECT * FROM multiout_record_as('dict', null, null, true);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('tuple', null, null, false);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('tuple', 'one', null, false);
+  first | second
+ -------+--------
+  one   |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('tuple', null, 2, false);
+  first | second
+ -------+--------
+        |      2
+ (1 row)
+
+ SELECT * FROM multiout_record_as('tuple', 'three', 3, false);
+  first | second
+ -------+--------
+  three |      3
+ (1 row)
+
+ SELECT * FROM multiout_record_as('tuple', null, null, true);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('list', null, null, false);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('list', 'one', null, false);
+  first | second
+ -------+--------
+  one   |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('list', null, 2, false);
+  first | second
+ -------+--------
+        |      2
+ (1 row)
+
+ SELECT * FROM multiout_record_as('list', 'three', 3, false);
+  first | second
+ -------+--------
+  three |      3
+ (1 row)
+
+ SELECT * FROM multiout_record_as('list', null, null, true);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('obj', null, null, false);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('obj', 'one', null, false);
+  first | second
+ -------+--------
+  one   |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('obj', null, 2, false);
+  first | second
+ -------+--------
+        |      2
+ (1 row)
+
+ SELECT * FROM multiout_record_as('obj', 'three', 3, false);
+  first | second
+ -------+--------
+  three |      3
+ (1 row)
+
+ SELECT * FROM multiout_record_as('obj', null, null, true);
+  first | second
+ -------+--------
+        |
+ (1 row)
+
+ SELECT * FROM multiout_record_as('str', 'one', 1, false);
+  first | second
+ -------+--------
+  'one' |      1
+ (1 row)
+
+ SELECT * FROM multiout_record_as('str', 'one', 2, false);
+  first | second
+ -------+--------
+  'one' |      2
+ (1 row)
+
  SELECT *, s IS NULL AS snull FROM multiout_record_as('tuple', 'xxx', NULL, 'f') AS f(f, s);
    f  | s | snull
  -----+---+-------
*************** elif typ == 'dict':
*** 179,188 ****
--- 313,326 ----
      d = {'first': n * 2, 'second': n * 3, 'extra': 'not important'}
  elif typ == 'tuple':
      d = (n * 2, n * 3)
+ elif typ == 'list':
+     d = [ n * 2, n * 3 ]
  elif typ == 'obj':
      class d: pass
      d.first = n * 2
      d.second = n * 3
+ elif typ == 'str':
+     d = "(%r,%r)" % (n * 2, n * 3)
  for i in range(n):
      yield (i, d)
  $$ LANGUAGE plpythonu;
*************** SELECT * FROM multiout_table_type_setof(
*** 200,205 ****
--- 338,355 ----
   2 | (6,9)
  (3 rows)

+ SELECT * FROM multiout_table_type_setof('dict', 'f', 7);
+  n | column2
+ ---+---------
+  0 | (14,21)
+  1 | (14,21)
+  2 | (14,21)
+  3 | (14,21)
+  4 | (14,21)
+  5 | (14,21)
+  6 | (14,21)
+ (7 rows)
+
  SELECT * FROM multiout_table_type_setof('tuple', 'f', 2);
   n | column2
  ---+---------
*************** SELECT * FROM multiout_table_type_setof(
*** 207,212 ****
--- 357,385 ----
   1 | (4,6)
  (2 rows)

+ SELECT * FROM multiout_table_type_setof('tuple', 'f', 3);
+  n | column2
+ ---+---------
+  0 | (6,9)
+  1 | (6,9)
+  2 | (6,9)
+ (3 rows)
+
+ SELECT * FROM multiout_table_type_setof('list', 'f', 2);
+  n | column2
+ ---+---------
+  0 | (4,6)
+  1 | (4,6)
+ (2 rows)
+
+ SELECT * FROM multiout_table_type_setof('list', 'f', 3);
+  n | column2
+ ---+---------
+  0 | (6,9)
+  1 | (6,9)
+  2 | (6,9)
+ (3 rows)
+
  SELECT * FROM multiout_table_type_setof('obj', 'f', 4);
   n | column2
  ---+---------
*************** SELECT * FROM multiout_table_type_setof(
*** 216,221 ****
--- 389,427 ----
   3 | (8,12)
  (4 rows)

+ SELECT * FROM multiout_table_type_setof('obj', 'f', 5);
+  n | column2
+ ---+---------
+  0 | (10,15)
+  1 | (10,15)
+  2 | (10,15)
+  3 | (10,15)
+  4 | (10,15)
+ (5 rows)
+
+ SELECT * FROM multiout_table_type_setof('str', 'f', 6);
+  n | column2
+ ---+---------
+  0 | (12,18)
+  1 | (12,18)
+  2 | (12,18)
+  3 | (12,18)
+  4 | (12,18)
+  5 | (12,18)
+ (6 rows)
+
+ SELECT * FROM multiout_table_type_setof('str', 'f', 7);
+  n | column2
+ ---+---------
+  0 | (14,21)
+  1 | (14,21)
+  2 | (14,21)
+  3 | (14,21)
+  4 | (14,21)
+  5 | (14,21)
+  6 | (14,21)
+ (7 rows)
+
  SELECT * FROM multiout_table_type_setof('dict', 't', 3);
   n | column2
  ---+---------
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index 7a45b22..05add6e 100644
*** a/src/pl/plpython/plpy_typeio.c
--- b/src/pl/plpython/plpy_typeio.c
*************** PLySequence_ToArray(PLyObToDatum *arg, i
*** 912,931 ****
  static Datum
  PLyString_ToComposite(PLyTypeInfo *info, TupleDesc desc, PyObject *string)
  {
      HeapTuple    typeTup;
      PLyExecutionContext *exec_ctx = PLy_current_execution_context();

      typeTup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(desc->tdtypeid));
      if (!HeapTupleIsValid(typeTup))
          elog(ERROR, "cache lookup failed for type %u", desc->tdtypeid);

!     PLy_output_datum_func2(&info->out.d, typeTup,
                             exec_ctx->curr_proc->langid,
                             exec_ctx->curr_proc->trftypes);

      ReleaseSysCache(typeTup);

!     return PLyObject_ToDatum(&info->out.d, info->out.d.typmod, string);
  }


--- 912,941 ----
  static Datum
  PLyString_ToComposite(PLyTypeInfo *info, TupleDesc desc, PyObject *string)
  {
+     Datum        result;
      HeapTuple    typeTup;
+     PLyTypeInfo locinfo;
      PLyExecutionContext *exec_ctx = PLy_current_execution_context();

+     /* Create a dummy PLyTypeInfo */
+     MemSet(&locinfo, 0, sizeof(PLyTypeInfo));
+     PLy_typeinfo_init(&locinfo);
+
      typeTup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(desc->tdtypeid));
      if (!HeapTupleIsValid(typeTup))
          elog(ERROR, "cache lookup failed for type %u", desc->tdtypeid);

!     PLy_output_datum_func2(&locinfo.out.d, typeTup,
                             exec_ctx->curr_proc->langid,
                             exec_ctx->curr_proc->trftypes);

      ReleaseSysCache(typeTup);

!     result = PLyObject_ToDatum(&locinfo.out.d, desc->tdtypmod, string);
!
!     PLy_typeinfo_dealloc(&locinfo);
!
!     return result;
  }


diff --git a/src/pl/plpython/sql/plpython_composite.sql b/src/pl/plpython/sql/plpython_composite.sql
index cb5fffe..473342c 100644
*** a/src/pl/plpython/sql/plpython_composite.sql
--- b/src/pl/plpython/sql/plpython_composite.sql
*************** elif typ == 'obj':
*** 32,41 ****
--- 32,71 ----
      type_record.first = first
      type_record.second = second
      return type_record
+ elif typ == 'str':
+     return "('%s',%r)" % (first, second)
  $$ LANGUAGE plpythonu;

  SELECT * FROM multiout_record_as('dict', 'foo', 1, 'f');
  SELECT multiout_record_as('dict', 'foo', 1, 'f');
+
+ SELECT * FROM multiout_record_as('dict', null, null, false);
+ SELECT * FROM multiout_record_as('dict', 'one', null, false);
+ SELECT * FROM multiout_record_as('dict', null, 2, false);
+ SELECT * FROM multiout_record_as('dict', 'three', 3, false);
+ SELECT * FROM multiout_record_as('dict', null, null, true);
+
+ SELECT * FROM multiout_record_as('tuple', null, null, false);
+ SELECT * FROM multiout_record_as('tuple', 'one', null, false);
+ SELECT * FROM multiout_record_as('tuple', null, 2, false);
+ SELECT * FROM multiout_record_as('tuple', 'three', 3, false);
+ SELECT * FROM multiout_record_as('tuple', null, null, true);
+
+ SELECT * FROM multiout_record_as('list', null, null, false);
+ SELECT * FROM multiout_record_as('list', 'one', null, false);
+ SELECT * FROM multiout_record_as('list', null, 2, false);
+ SELECT * FROM multiout_record_as('list', 'three', 3, false);
+ SELECT * FROM multiout_record_as('list', null, null, true);
+
+ SELECT * FROM multiout_record_as('obj', null, null, false);
+ SELECT * FROM multiout_record_as('obj', 'one', null, false);
+ SELECT * FROM multiout_record_as('obj', null, 2, false);
+ SELECT * FROM multiout_record_as('obj', 'three', 3, false);
+ SELECT * FROM multiout_record_as('obj', null, null, true);
+
+ SELECT * FROM multiout_record_as('str', 'one', 1, false);
+ SELECT * FROM multiout_record_as('str', 'one', 2, false);
+
  SELECT *, s IS NULL AS snull FROM multiout_record_as('tuple', 'xxx', NULL, 'f') AS f(f, s);
  SELECT *, f IS NULL AS fnull, s IS NULL AS snull FROM multiout_record_as('tuple', 'xxx', 1, 't') AS f(f, s);
  SELECT * FROM multiout_record_as('obj', NULL, 10, 'f');
*************** elif typ == 'dict':
*** 92,109 ****
--- 122,150 ----
      d = {'first': n * 2, 'second': n * 3, 'extra': 'not important'}
  elif typ == 'tuple':
      d = (n * 2, n * 3)
+ elif typ == 'list':
+     d = [ n * 2, n * 3 ]
  elif typ == 'obj':
      class d: pass
      d.first = n * 2
      d.second = n * 3
+ elif typ == 'str':
+     d = "(%r,%r)" % (n * 2, n * 3)
  for i in range(n):
      yield (i, d)
  $$ LANGUAGE plpythonu;

  SELECT * FROM multiout_composite(2);
  SELECT * FROM multiout_table_type_setof('dict', 'f', 3);
+ SELECT * FROM multiout_table_type_setof('dict', 'f', 7);
  SELECT * FROM multiout_table_type_setof('tuple', 'f', 2);
+ SELECT * FROM multiout_table_type_setof('tuple', 'f', 3);
+ SELECT * FROM multiout_table_type_setof('list', 'f', 2);
+ SELECT * FROM multiout_table_type_setof('list', 'f', 3);
  SELECT * FROM multiout_table_type_setof('obj', 'f', 4);
+ SELECT * FROM multiout_table_type_setof('obj', 'f', 5);
+ SELECT * FROM multiout_table_type_setof('str', 'f', 6);
+ SELECT * FROM multiout_table_type_setof('str', 'f', 7);
  SELECT * FROM multiout_table_type_setof('dict', 't', 3);

  -- check what happens if a type changes under us

Re: Badly broken logic in plpython composite-type handling

От
"Joshua D. Drake"
Дата:
On 08/19/2015 05:05 PM, Tom Lane wrote:

> Barring objections, I propose to commit and back-patch this.  The crash
> can be demonstrated back to 9.1.
>
>             regards, tom lane

+1


-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.