Обсуждение: BUG #7516: PL/Perl crash

Поиск
Список
Период
Сортировка

BUG #7516: PL/Perl crash

От
pgmail@joh.to
Дата:
The following bug has been logged on the website:

Bug reference:      7516
Logged by:          Marko Tiikkaja
Email address:      pgmail@joh.to
PostgreSQL version: 9.1.5
Operating system:   64-bit linux
Description:        =


Hi,

We had a segmentation fault in PostgreSQL 9.1.5 with PL/PerlU.

The backtrace looks like this:

#0  0x00007f80be88cb04 in plperl_spi_exec_prepared (query=3D0x1671eb8
"0x168cd70", attr=3D0x0, argc=3D2, argv=3D0x19b4e40) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/pl/plperl/plperl.c:3373
#1  0x00007f80be88e4f0 in XS__spi_exec_prepared (my_perl=3D<optimized out>,
cv=3D<optimized out>) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/pl/plperl/SPI.xs:141
#2  0x00007f80be5b67ff in Perl_pp_entersub () from /usr/lib/libperl.so.5.14
#3  0x00007f80be5adc96 in Perl_runops_standard () from
/usr/lib/libperl.so.5.14
#4  0x00007f80be54959e in Perl_call_sv () from /usr/lib/libperl.so.5.14
#5  0x00007f80be885c55 in plperl_call_perl_func (desc=3D0x189a010,
fcinfo=3D0x188c490) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/pl/plperl/plperl.c:2017
#6  0x00007f80be88a17c in plperl_func_handler (fcinfo=3D0x188c490) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/pl/plperl/plperl.c:2156
#7  plperl_call_handler (fcinfo=3D0x188c490) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/pl/plperl/plperl.c:1671
#8  0x0000000000707147 in fmgr_security_definer (fcinfo=3D0x188c490) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/utils/fmgr/fmgr.c:9=
75
#9  0x0000000000575e45 in ExecMakeFunctionResult (fcache=3D0x188c420,
econtext=3D0x188c230, isNull=3D0x188cec8 "id", isDone=3D0x188cfe0)
    at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/executor/execQual.c=
:1917
#10 0x0000000000578512 in ExecTargetList (isDone=3D0x7fff051891ec,
itemIsDone=3D0x188cfe0, isnull=3D0x188cec8 "id", values=3D0x188ceb0,
econtext=3D0x188c230, targetlist=3D0x188cfb0)
    at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/executor/execQual.c=
:5210
#11 ExecProject (projInfo=3D<optimized out>, isDone=3D0x7fff051891ec) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/executor/execQual.c=
:5425
#12 0x0000000000588b4a in ExecResult (node=3D0x188c120) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/executor/nodeResult=
.c:155
#13 0x00000000005713e8 in ExecProcNode (node=3D0x188c120) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/executor/execProcno=
de.c:367
#14 0x000000000056e662 in ExecutePlan (dest=3D0x15ae900, direction=3D<optim=
ized
out>, numberTuples=3D0, sendTuples=3D1 '\001', operation=3DCMD_SELECT,
planstate=3D0x188c120, estate=3D0x188c010)
    at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/executor/execMain.c=
:1440
15 standard_ExecutorRun (queryDesc=3D0x1464830, direction=3D<optimized out>,
count=3D0) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/executor/execMain.c=
:314
#16 0x0000000000642ee7 in PortalRunSelect (portal=3D0x1421710,
forward=3D<optimized out>, count=3D0, dest=3D0x15ae900) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/tcop/pquery.c:943
#17 0x00000000006443d0 in PortalRun (portal=3D0x1421710,
count=3D9223372036854775807, isTopLevel=3D1 '\001', dest=3D0x15ae900,
altdest=3D0x15ae900, completionTag=3D0x7fff05189670 "")
    at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/tcop/pquery.c:787
#18 0x0000000000640fba in exec_execute_message (max_rows=3D<optimized out>,
portal_name=3D0x15ae4f0 "") at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/tcop/postgres.c:1965
#19 PostgresMain (argc=3D<optimized out>, argv=3D<optimized out>,
username=3D<optimized out>) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/tcop/postgres.c:4025
#20 0x0000000000602813 in BackendRun (port=3D0x145b990) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/postmaster/postmast=
er.c:3617
#21 BackendStartup (port=3D0x145b990) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/postmaster/postmast=
er.c:3302
#22 ServerLoop () at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/postmaster/postmast=
er.c:1466
#23 0x0000000000603281 in PostmasterMain (argc=3D<optimized out>,
argv=3D<optimized out>) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/postmaster/postmast=
er.c:1127
#24 0x000000000045a440 in main (argc=3D5, argv=3D0x1416170) at
/build/buildd/postgresql-9.1-9.1.5/build/../src/backend/main/main.c:199

It seems to have happened when a PL/PerlU executed a prepared statement
which calls another PL/PerlU function.

So far we have not been able to reproduce the problem.  Any thoughts? =

Someone seen something similar?


Regards,
Marko Tiikkaja

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
pgmail@joh.to writes:
> We had a segmentation fault in PostgreSQL 9.1.5 with PL/PerlU.
> ...
> It seems to have happened when a PL/PerlU executed a prepared statement
> which calls another PL/PerlU function.

Hm.  Is it possible that the prepared statement recursively called the
*same* plperl function?  And that somebody did a CREATE OR REPLACE on
that function meanwhile?  It looks to me like validate_plperl_function()
for the inner invocation could pull the rug out from under the outer
invocation's prodesc pointer.  Although even granting all that, you'd
have to be quite unlucky to get a crash right there, because free'ing
the prodesc wouldn't ordinarily cause the memory to disappear.  This
might be the wrong theory.  You seem to have debug symbols for that
core dump --- can you tell which of the dereferences in line 3373
caused the crash?

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Marko Tiikkaja
Дата:
Hi,

On 03/09/2012 18:06, Tom Lane wrote:
> pgmail@joh.to writes:
>> We had a segmentation fault in PostgreSQL 9.1.5 with PL/PerlU.
>> ...
>> It seems to have happened when a PL/PerlU executed a prepared statement
>> which calls another PL/PerlU function.
>
> Hm.  Is it possible that the prepared statement recursively called the
> *same* plperl function?  And that somebody did a CREATE OR REPLACE on
> that function meanwhile?

No, that doesn't seem possible in this case.  The function calls some
prepared statements repeatedly, but no recursion should occur.

> This
> might be the wrong theory.  You seem to have debug symbols for that
> core dump --- can you tell which of the dereferences in line 3373
> caused the crash?

The current_call_data->prodesc->fn_readonly one.

Please let me know if I can be of more assistance.


Regards,
Marko Tiikkaja

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
Marko Tiikkaja <pgmail@joh.to> writes:
> On 03/09/2012 18:06, Tom Lane wrote:
>> This might be the wrong theory.  You seem to have debug symbols for that
>> core dump --- can you tell which of the dereferences in line 3373
>> caused the crash?

> The current_call_data->prodesc->fn_readonly one.
> Please let me know if I can be of more assistance.

Hm, can you show us the contents of the *current_call_data struct?

Another thing that seems a bit risky is that plperl_func_handler sets up
current_call_data as a pointer to a mostly-zeroed struct and then does
assorted things that could throw error.  That would leave a dangling
value of current_call_data ...

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
I wrote:
> Another thing that seems a bit risky is that plperl_func_handler sets up
> current_call_data as a pointer to a mostly-zeroed struct and then does
> assorted things that could throw error.  That would leave a dangling
> value of current_call_data ...

Meh, scratch that --- I missed that the caller of plperl_func_handler is
where there's a PG_TRY block that's responsible for restoring the old
value of current_call_data.  It would still be interesting to nose
around and see what's in *current_call_data, as well as any of the
pointed-to data structures that exist.

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
Marko Tiikkaja <pgmail@joh.to> writes:
> On 03/09/2012 18:06, Tom Lane wrote:
>> Hm.  Is it possible that the prepared statement recursively called the
>> *same* plperl function?  And that somebody did a CREATE OR REPLACE on
>> that function meanwhile?

> No, that doesn't seem possible in this case.  The function calls some
> prepared statements repeatedly, but no recursion should occur.

Hm.  Well, I can definitely reproduce a segfault using this example:

CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS $$
   spi_exec_query('CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS \'return $_[0] * 3;\' LANGUAGE
plperl;');
   spi_exec_query('select self_modify(42) AS a');
   return $_[0] * 2;
$$ LANGUAGE plperl;

select self_modify(42);

The crash is 100% reproducible if you tweak plperl like this:

diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index bfa805d..307874c 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -2393,7 +2393,7 @@ validate_plperl_function(plperl_proc_ptr *proc_ptr, HeapTu
            activate_interpreter(oldinterp);
        }
        free(prodesc->proname);
-       free(prodesc);
+       memset(prodesc, 0, sizeof(plperl_proc_desc));
    }

    return false;

Without that it's probabilistic, since the subsequent malloc in
compile_plperl_function is likely to realloc the exact same space.

So I think we need to institute a reference counting scheme for
the plperl_proc_desc records ...

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
I wrote:
> Hm.  Well, I can definitely reproduce a segfault using this example:

> CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS $$
>    spi_exec_query('CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS \'return $_[0] * 3;\' LANGUAGE
plperl;');
>    spi_exec_query('select self_modify(42) AS a');
>    return $_[0] * 2;
> $$ LANGUAGE plperl;

> select self_modify(42);

> So I think we need to institute a reference counting scheme for
> the plperl_proc_desc records ...

The attached patch fixes the problem I'm seeing.  I am not sure whether
it fixes what you saw; the crash you showed is in the right place, but
unless there was a recursive call to a pl/perl function, I don't see
how the existing code could have freed the prodesc too soon.

            regards, tom lane

diff --git a/src/pl/plperl/expected/plperl.out b/src/pl/plperl/expected/plperl.out
index 906dc15e0ca097ec962c6dce9a08b29cb31d35b5..29c1d11c447e6087fee85565c0bf458cbc21486d 100644
*** a/src/pl/plperl/expected/plperl.out
--- b/src/pl/plperl/expected/plperl.out
*************** $$ LANGUAGE plperl;
*** 693,695 ****
--- 693,713 ----
  SELECT text_scalarref();
  ERROR:  PL/Perl function must return reference to hash or array
  CONTEXT:  PL/Perl function "text_scalarref"
+ -- check safe behavior when a function body is replaced during execution
+ CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS $$
+    spi_exec_query('CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS \'return $_[0] * 3;\' LANGUAGE
plperl;');
+    spi_exec_query('select self_modify(42) AS a');
+    return $_[0] * 2;
+ $$ LANGUAGE plperl;
+ SELECT self_modify(42);
+  self_modify
+ -------------
+           84
+ (1 row)
+
+ SELECT self_modify(42);
+  self_modify
+ -------------
+          126
+ (1 row)
+
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index bfa805d944f7b309428b6818b7160215c1b8e9fc..2c39edac5bd93ca8b446aafb88a17e0b7a5088db 100644
*** a/src/pl/plperl/plperl.c
--- b/src/pl/plperl/plperl.c
*************** PG_MODULE_MAGIC;
*** 70,75 ****
--- 70,76 ----
   *
   * The plperl_interp_desc structs are kept in a Postgres hash table indexed
   * by userid OID, with OID 0 used for the single untrusted interpreter.
+  * Once created, an interpreter is kept for the life of the process.
   *
   * We start out by creating a "held" interpreter, which we initialize
   * only as far as we can do without deciding if it will be trusted or
*************** typedef struct plperl_interp_desc
*** 95,122 ****

  /**********************************************************************
   * The information we cache about loaded procedures
   **********************************************************************/
  typedef struct plperl_proc_desc
  {
      char       *proname;        /* user name of procedure */
!     TransactionId fn_xmin;
      ItemPointerData fn_tid;
      plperl_interp_desc *interp; /* interpreter it's created in */
!     bool        fn_readonly;
!     bool        lanpltrusted;
      bool        fn_retistuple;    /* true, if function returns tuple */
      bool        fn_retisset;    /* true, if function returns set */
      bool        fn_retisarray;    /* true if function returns array */
      Oid            result_oid;        /* Oid of result type */
      FmgrInfo    result_in_func; /* I/O function and arg for result type */
      Oid            result_typioparam;
      int            nargs;
      FmgrInfo    arg_out_func[FUNC_MAX_ARGS];
      bool        arg_is_rowtype[FUNC_MAX_ARGS];
      Oid            arg_arraytype[FUNC_MAX_ARGS];    /* InvalidOid if not an array */
-     SV           *reference;
  } plperl_proc_desc;

  /**********************************************************************
   * For speedy lookup, we maintain a hash table mapping from
   * function OID + trigger flag + user OID to plperl_proc_desc pointers.
--- 96,139 ----

  /**********************************************************************
   * The information we cache about loaded procedures
+  *
+  * The refcount field counts the struct's reference from the hash table shown
+  * below, plus one reference for each function call level that is using the
+  * struct.  We can release the struct, and the associated Perl sub, when the
+  * refcount goes to zero.
   **********************************************************************/
  typedef struct plperl_proc_desc
  {
      char       *proname;        /* user name of procedure */
!     TransactionId fn_xmin;        /* xmin/TID of procedure's pg_proc tuple */
      ItemPointerData fn_tid;
+     int            refcount;        /* reference count of this struct */
+     SV           *reference;        /* CODE reference for Perl sub */
      plperl_interp_desc *interp; /* interpreter it's created in */
!     bool        fn_readonly;    /* is function readonly (not volatile)? */
!     bool        lanpltrusted;    /* is it plperl, rather than plperlu? */
      bool        fn_retistuple;    /* true, if function returns tuple */
      bool        fn_retisset;    /* true, if function returns set */
      bool        fn_retisarray;    /* true if function returns array */
+     /* Conversion info for function's result type: */
      Oid            result_oid;        /* Oid of result type */
      FmgrInfo    result_in_func; /* I/O function and arg for result type */
      Oid            result_typioparam;
+     /* Conversion info for function's argument types: */
      int            nargs;
      FmgrInfo    arg_out_func[FUNC_MAX_ARGS];
      bool        arg_is_rowtype[FUNC_MAX_ARGS];
      Oid            arg_arraytype[FUNC_MAX_ARGS];    /* InvalidOid if not an array */
  } plperl_proc_desc;

+ #define increment_prodesc_refcount(prodesc)  \
+     ((prodesc)->refcount++)
+ #define decrement_prodesc_refcount(prodesc)  \
+     do { \
+         if (--((prodesc)->refcount) <= 0) \
+             free_plperl_function(prodesc); \
+     } while(0)
+
  /**********************************************************************
   * For speedy lookup, we maintain a hash table mapping from
   * function OID + trigger flag + user OID to plperl_proc_desc pointers.
*************** static void set_interp_require(bool trus
*** 238,243 ****
--- 255,262 ----
  static Datum plperl_func_handler(PG_FUNCTION_ARGS);
  static Datum plperl_trigger_handler(PG_FUNCTION_ARGS);

+ static void free_plperl_function(plperl_proc_desc *prodesc);
+
  static plperl_proc_desc *compile_plperl_function(Oid fn_oid, bool is_trigger);

  static SV  *plperl_hash_from_tuple(HeapTuple tuple, TupleDesc tupdesc);
*************** plperl_call_handler(PG_FUNCTION_ARGS)
*** 1689,1694 ****
--- 1708,1714 ----

      PG_TRY();
      {
+         current_call_data = NULL;
          if (CALLED_AS_TRIGGER(fcinfo))
              retval = PointerGetDatum(plperl_trigger_handler(fcinfo));
          else
*************** plperl_call_handler(PG_FUNCTION_ARGS)
*** 1696,1707 ****
--- 1716,1731 ----
      }
      PG_CATCH();
      {
+         if (current_call_data && current_call_data->prodesc)
+             decrement_prodesc_refcount(current_call_data->prodesc);
          current_call_data = save_call_data;
          activate_interpreter(oldinterp);
          PG_RE_THROW();
      }
      PG_END_TRY();

+     if (current_call_data && current_call_data->prodesc)
+         decrement_prodesc_refcount(current_call_data->prodesc);
      current_call_data = save_call_data;
      activate_interpreter(oldinterp);
      return retval;
*************** plperl_inline_handler(PG_FUNCTION_ARGS)
*** 1753,1766 ****
      desc.nargs = 0;
      desc.reference = NULL;

-     current_call_data = (plperl_call_data *) palloc0(sizeof(plperl_call_data));
-     current_call_data->fcinfo = &fake_fcinfo;
-     current_call_data->prodesc = &desc;
-
      PG_TRY();
      {
          SV           *perlret;

          if (SPI_connect() != SPI_OK_CONNECT)
              elog(ERROR, "could not connect to SPI manager");

--- 1777,1791 ----
      desc.nargs = 0;
      desc.reference = NULL;

      PG_TRY();
      {
          SV           *perlret;

+         current_call_data = (plperl_call_data *) palloc0(sizeof(plperl_call_data));
+         current_call_data->fcinfo = &fake_fcinfo;
+         current_call_data->prodesc = &desc;
+         /* we do not bother with refcounting the fake prodesc */
+
          if (SPI_connect() != SPI_OK_CONNECT)
              elog(ERROR, "could not connect to SPI manager");

*************** plperl_func_handler(PG_FUNCTION_ARGS)
*** 2154,2159 ****
--- 2179,2185 ----

      prodesc = compile_plperl_function(fcinfo->flinfo->fn_oid, false);
      current_call_data->prodesc = prodesc;
+     increment_prodesc_refcount(prodesc);

      /* Set a callback for error reporting */
      pl_error_context.callback = plperl_exec_callback;
*************** plperl_trigger_handler(PG_FUNCTION_ARGS)
*** 2274,2279 ****
--- 2300,2306 ----
      /* Find or compile the function */
      prodesc = compile_plperl_function(fcinfo->flinfo->fn_oid, true);
      current_call_data->prodesc = prodesc;
+     increment_prodesc_refcount(prodesc);

      /* Set a callback for error reporting */
      pl_error_context.callback = plperl_exec_callback;
*************** validate_plperl_function(plperl_proc_ptr
*** 2383,2405 ****

          /* Otherwise, unlink the obsoleted entry from the hashtable ... */
          proc_ptr->proc_ptr = NULL;
!         /* ... and throw it away */
!         if (prodesc->reference)
!         {
!             plperl_interp_desc *oldinterp = plperl_active_interp;
!
!             activate_interpreter(prodesc->interp);
!             SvREFCNT_dec(prodesc->reference);
!             activate_interpreter(oldinterp);
!         }
!         free(prodesc->proname);
!         free(prodesc);
      }

      return false;
  }


  static plperl_proc_desc *
  compile_plperl_function(Oid fn_oid, bool is_trigger)
  {
--- 2410,2444 ----

          /* Otherwise, unlink the obsoleted entry from the hashtable ... */
          proc_ptr->proc_ptr = NULL;
!         /* ... and release the corresponding refcount, probably deleting it */
!         decrement_prodesc_refcount(prodesc);
      }

      return false;
  }


+ static void
+ free_plperl_function(plperl_proc_desc *prodesc)
+ {
+     Assert(prodesc->refcount <= 0);
+     /* Release CODE reference, if we have one, from the appropriate interp */
+     if (prodesc->reference)
+     {
+         plperl_interp_desc *oldinterp = plperl_active_interp;
+
+         activate_interpreter(prodesc->interp);
+         SvREFCNT_dec(prodesc->reference);
+         activate_interpreter(oldinterp);
+     }
+     /* Get rid of what we conveniently can of our own structs */
+     /* (FmgrInfo subsidiary info will get leaked ...) */
+     if (prodesc->proname)
+         free(prodesc->proname);
+     free(prodesc);
+ }
+
+
  static plperl_proc_desc *
  compile_plperl_function(Oid fn_oid, bool is_trigger)
  {
*************** compile_plperl_function(Oid fn_oid, bool
*** 2470,2481 ****
--- 2509,2525 ----
              ereport(ERROR,
                      (errcode(ERRCODE_OUT_OF_MEMORY),
                       errmsg("out of memory")));
+         /* Initialize all fields to 0 so free_plperl_function is safe */
          MemSet(prodesc, 0, sizeof(plperl_proc_desc));
+
          prodesc->proname = strdup(NameStr(procStruct->proname));
          if (prodesc->proname == NULL)
+         {
+             free_plperl_function(prodesc);
              ereport(ERROR,
                      (errcode(ERRCODE_OUT_OF_MEMORY),
                       errmsg("out of memory")));
+         }
          prodesc->fn_xmin = HeapTupleHeaderGetXmin(procTup->t_data);
          prodesc->fn_tid = procTup->t_self;

*************** compile_plperl_function(Oid fn_oid, bool
*** 2490,2497 ****
                                    ObjectIdGetDatum(procStruct->prolang));
          if (!HeapTupleIsValid(langTup))
          {
!             free(prodesc->proname);
!             free(prodesc);
              elog(ERROR, "cache lookup failed for language %u",
                   procStruct->prolang);
          }
--- 2534,2540 ----
                                    ObjectIdGetDatum(procStruct->prolang));
          if (!HeapTupleIsValid(langTup))
          {
!             free_plperl_function(prodesc);
              elog(ERROR, "cache lookup failed for language %u",
                   procStruct->prolang);
          }
*************** compile_plperl_function(Oid fn_oid, bool
*** 2510,2517 ****
                                  ObjectIdGetDatum(procStruct->prorettype));
              if (!HeapTupleIsValid(typeTup))
              {
!                 free(prodesc->proname);
!                 free(prodesc);
                  elog(ERROR, "cache lookup failed for type %u",
                       procStruct->prorettype);
              }
--- 2553,2559 ----
                                  ObjectIdGetDatum(procStruct->prorettype));
              if (!HeapTupleIsValid(typeTup))
              {
!                 free_plperl_function(prodesc);
                  elog(ERROR, "cache lookup failed for type %u",
                       procStruct->prorettype);
              }
*************** compile_plperl_function(Oid fn_oid, bool
*** 2525,2532 ****
                       /* okay */ ;
                  else if (procStruct->prorettype == TRIGGEROID)
                  {
!                     free(prodesc->proname);
!                     free(prodesc);
                      ereport(ERROR,
                              (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                               errmsg("trigger functions can only be called "
--- 2567,2573 ----
                       /* okay */ ;
                  else if (procStruct->prorettype == TRIGGEROID)
                  {
!                     free_plperl_function(prodesc);
                      ereport(ERROR,
                              (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                               errmsg("trigger functions can only be called "
*************** compile_plperl_function(Oid fn_oid, bool
*** 2534,2541 ****
                  }
                  else
                  {
!                     free(prodesc->proname);
!                     free(prodesc);
                      ereport(ERROR,
                              (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                               errmsg("PL/Perl functions cannot return type %s",
--- 2575,2581 ----
                  }
                  else
                  {
!                     free_plperl_function(prodesc);
                      ereport(ERROR,
                              (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                               errmsg("PL/Perl functions cannot return type %s",
*************** compile_plperl_function(Oid fn_oid, bool
*** 2570,2577 ****
                          ObjectIdGetDatum(procStruct->proargtypes.values[i]));
                  if (!HeapTupleIsValid(typeTup))
                  {
!                     free(prodesc->proname);
!                     free(prodesc);
                      elog(ERROR, "cache lookup failed for type %u",
                           procStruct->proargtypes.values[i]);
                  }
--- 2610,2616 ----
                          ObjectIdGetDatum(procStruct->proargtypes.values[i]));
                  if (!HeapTupleIsValid(typeTup))
                  {
!                     free_plperl_function(prodesc);
                      elog(ERROR, "cache lookup failed for type %u",
                           procStruct->proargtypes.values[i]);
                  }
*************** compile_plperl_function(Oid fn_oid, bool
*** 2581,2588 ****
                  if (typeStruct->typtype == TYPTYPE_PSEUDO &&
                      procStruct->proargtypes.values[i] != RECORDOID)
                  {
!                     free(prodesc->proname);
!                     free(prodesc);
                      ereport(ERROR,
                              (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                               errmsg("PL/Perl functions cannot accept type %s",
--- 2620,2626 ----
                  if (typeStruct->typtype == TYPTYPE_PSEUDO &&
                      procStruct->proargtypes.values[i] != RECORDOID)
                  {
!                     free_plperl_function(prodesc);
                      ereport(ERROR,
                              (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                               errmsg("PL/Perl functions cannot accept type %s",
*************** compile_plperl_function(Oid fn_oid, bool
*** 2635,2642 ****
          pfree(proc_source);
          if (!prodesc->reference)    /* can this happen? */
          {
!             free(prodesc->proname);
!             free(prodesc);
              elog(ERROR, "could not create PL/Perl internal procedure");
          }

--- 2673,2679 ----
          pfree(proc_source);
          if (!prodesc->reference)    /* can this happen? */
          {
!             free_plperl_function(prodesc);
              elog(ERROR, "could not create PL/Perl internal procedure");
          }

*************** compile_plperl_function(Oid fn_oid, bool
*** 2648,2653 ****
--- 2685,2691 ----
          proc_ptr = hash_search(plperl_proc_hash, &proc_key,
                                 HASH_ENTER, NULL);
          proc_ptr->proc_ptr = prodesc;
+         increment_prodesc_refcount(prodesc);
      }

      /* restore previous error callback */
diff --git a/src/pl/plperl/sql/plperl.sql b/src/pl/plperl/sql/plperl.sql
index a5e3840dac23667ff2d599c46b6e6887cc6875a8..ad361614c488fa7f04dbd10a6f4844f8cf506d1b 100644
*** a/src/pl/plperl/sql/plperl.sql
--- b/src/pl/plperl/sql/plperl.sql
*************** CREATE OR REPLACE FUNCTION text_scalarre
*** 462,464 ****
--- 462,474 ----
  $$ LANGUAGE plperl;

  SELECT text_scalarref();
+
+ -- check safe behavior when a function body is replaced during execution
+ CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS $$
+    spi_exec_query('CREATE OR REPLACE FUNCTION self_modify(INTEGER) RETURNS INTEGER AS \'return $_[0] * 3;\' LANGUAGE
plperl;');
+    spi_exec_query('select self_modify(42) AS a');
+    return $_[0] * 2;
+ $$ LANGUAGE plperl;
+
+ SELECT self_modify(42);
+ SELECT self_modify(42);

Re: BUG #7516: PL/Perl crash

От
Marko Tiikkaja
Дата:
On 10/09/2012 00:37, Tom Lane wrote:
> The attached patch fixes the problem I'm seeing.  I am not sure whether
> it fixes what you saw; the crash you showed is in the right place, but
> unless there was a recursive call to a pl/perl function, I don't see
> how the existing code could have freed the prodesc too soon.

Joel Jacobson managed to narrow it down to this test case, which crashes
consistently on Ubuntu 12.04 both with and without your patch.  I,
however, wasn't able to reproduce the problem on my OS X Mountain Lion.
  I'll try to get some more information about it tomorrow, but in the
mean time if you can reproduce the problem or think of something, I'll
post the test case.


Regards,
Marko Tiikkaja

Вложения

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
Marko Tiikkaja <pgmail@joh.to> writes:
> Joel Jacobson managed to narrow it down to this test case, which crashes
> consistently on Ubuntu 12.04 both with and without your patch.  I,
> however, wasn't able to reproduce the problem on my OS X Mountain Lion.

Doesn't reproduce for me either ...

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Marko Tiikkaja
Дата:
On 9/12/12 1:50 AM, Tom Lane wrote:
> Marko Tiikkaja <pgmail@joh.to> writes:
>> Joel Jacobson managed to narrow it down to this test case, which crashes
>> consistently on Ubuntu 12.04 both with and without your patch.  I,
>> however, wasn't able to reproduce the problem on my OS X Mountain Lion.
>
> Doesn't reproduce for me either ...

Ok, I can reproduce it on my Ubuntu virtual machine.

The problem looks like this:

#0  free_plperl_function (prodesc=0x24195a0) at plperl.c:2428
#1  0x00007ff47babc045 in plperl_call_handler (fcinfo=0x247c160) at
plperl.c:1710
#2  0x00000000005663fa in ExecMakeFunctionResult (fcache=0x247c0f0,
econtext=0x247bf00, isNull=0x247ca78 "", isDone=0x247cb90) at
execQual.c:1917
#3  0x0000000000568942 in ExecTargetList (isDone=0x7fff1402fc0c,
itemIsDone=0x247cb90, isnull=0x247ca78 "", values=0x247ca60,
econtext=0x247bf00, targetlist=0x247cb60) at execQual.c:5210
#4  ExecProject (projInfo=<optimized out>, isDone=0x7fff1402fc0c) at
execQual.c:5425
#5  0x0000000000578aea in ExecResult (node=0x247bdf0) at nodeResult.c:155
#6  0x0000000000561b18 in ExecProcNode (node=0x247bdf0) at
execProcnode.c:367
#7  0x000000000055ef2a in ExecutePlan (dest=0xad4600,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, planstate=0x247bdf0, estate=0x247bce0) at
execMain.c:1440
#8  standard_ExecutorRun (queryDesc=0x244f6d0, direction=<optimized
out>, count=0) at execMain.c:314
#9  0x000000000058192d in _SPI_pquery (tcount=<optimized out>,
fire_triggers=1 '\001', queryDesc=<optimized out>) at spi.c:2110
#10 _SPI_execute_plan (paramLI=0x245e1f0, snapshot=<optimized out>,
crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=1 '\001',
tcount=0, plan=<optimized out>) at spi.c:1922
#11 0x0000000000581e57 in SPI_execute_plan_with_paramlist
(plan=0x2476c30, params=0x245e1f0, read_only=0 '\000', tcount=0) at
spi.c:423
#12 0x00007ff47b0c7075 in exec_run_select (estate=0x7fff14030270,
expr=0x24569f0, maxtuples=0, portalP=0x0) at pl_exec.c:4573
#13 0x00007ff47b0ca7b4 in exec_stmt_perform (estate=0x7fff14030270,
stmt=<optimized out>) at pl_exec.c:1413
#14 exec_stmt (stmt=0x2456ab0, estate=0x7fff14030270) at pl_exec.c:1289
#15 exec_stmts (estate=0x7fff14030270, stmts=<optimized out>) at
pl_exec.c:1248
#16 0x00007ff47b0cce59 in exec_stmt_block (estate=0x7fff14030270,
block=0x2457448) at pl_exec.c:1047
#17 0x00007ff47b0ca798 in exec_stmt (stmt=0x2457448,
estate=0x7fff14030270) at pl_exec.c:1281
#18 exec_stmts (estate=0x7fff14030270, stmts=<optimized out>) at
pl_exec.c:1248
#19 0x00007ff47b0ccfcc in exec_stmt_block (estate=0x7fff14030270,
block=0x2457508) at pl_exec.c:1186
#20 0x00007ff47b0cda37 in plpgsql_exec_function (func=0x243a140,
fcinfo=0x7fff14030580) at pl_exec.c:324
#21 0x00007ff47b0c26d3 in plpgsql_call_handler (fcinfo=0x7fff14030580)
at pl_handler.c:122
#22 0x0000000000566d4d in ExecMakeTableFunctionResult
(funcexpr=0x2443368, econtext=0x24428b0, expectedDesc=0x2443110,
randomAccess=0 '\000') at execQual.c:2146
#23 0x0000000000578381 in FunctionNext (node=0x24427a0) at
nodeFunctionscan.c:65
#24 0x0000000000568dde in ExecScanFetch (recheckMtd=0x578300
<FunctionRecheck>, accessMtd=0x578310 <FunctionNext>, node=0x24427a0) at
execScan.c:82
#25 ExecScan (node=0x24427a0, accessMtd=0x578310 <FunctionNext>,
recheckMtd=0x578300 <FunctionRecheck>) at execScan.c:132
#26 0x0000000000561a78 in ExecProcNode (node=0x24427a0) at
execProcnode.c:416
#27 0x000000000055ef2a in ExecutePlan (dest=0xad4600,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, planstate=0x24427a0, estate=0x2442690) at
execMain.c:1440
#28 standard_ExecutorRun (queryDesc=0x243f700, direction=<optimized
out>, count=0) at execMain.c:314
#29 0x000000000058192d in _SPI_pquery (tcount=<optimized out>,
fire_triggers=1 '\001', queryDesc=<optimized out>) at spi.c:2110
#30 _SPI_execute_plan (paramLI=0x243f670, snapshot=<optimized out>,
crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=1 '\001',
tcount=0, plan=<optimized out>) at spi.c:1922
#31 0x0000000000581f39 in SPI_execute_plan (plan=0x23ee750,
Values=0x243df38, Nulls=0x24376b0 "  RCHAR", read_only=0 '\000',
tcount=0) at spi.c:391
#32 0x00007ff47babce2d in plperl_spi_exec_prepared (query=0x2437690
"0x22930e0", attr=0x0, argc=2, argv=0x243df18) at plperl.c:3425
#33 0x00007ff47babe810 in XS__spi_exec_prepared (my_perl=<optimized
out>, cv=<optimized out>) at SPI.xs:141
#34 0x00007ff47b7e67ff in Perl_pp_entersub (my_perl=0x237d800) at
pp_hot.c:3046
#35 0x00007ff47b7ddc96 in Perl_runops_standard (my_perl=0x237d800) at
run.c:41
#36 0x00007ff47b77959e in Perl_call_sv (my_perl=0x237d800, sv=0x24017f8,
flags=10) at perl.c:2647
#37 0x00007ff47bab5f62 in plperl_call_perl_func (desc=0x24195a0,
fcinfo=0x242fa90) at plperl.c:2056
#38 0x00007ff47baba45b in plperl_func_handler (fcinfo=0x242fa90) at
plperl.c:2196
#39 plperl_call_handler (fcinfo=0x242fa90) at plperl.c:1705
#40 0x00000000005663fa in ExecMakeFunctionResult (fcache=0x242fa20,
econtext=0x242f830, isNull=0x24303a8 "", isDone=0x24304c0) at
execQual.c:1917
#41 0x0000000000568942 in ExecTargetList (isDone=0x7fff140312fc,
itemIsDone=0x24304c0, isnull=0x24303a8 "", values=0x2430390,
econtext=0x242f830, targetlist=0x2430490) at execQual.c:5210
#42 ExecProject (projInfo=<optimized out>, isDone=0x7fff140312fc) at
execQual.c:5425
#43 0x0000000000578aea in ExecResult (node=0x242f720) at nodeResult.c:155
#44 0x0000000000561b18 in ExecProcNode (node=0x242f720) at
execProcnode.c:367
#45 0x000000000055ef2a in ExecutePlan (dest=0x2429a80,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, planstate=0x242f720, estate=0x242f610) at
execMain.c:1440
#46 standard_ExecutorRun (queryDesc=0x22ae450, direction=<optimized
out>, count=0) at execMain.c:314
#47 0x0000000000629317 in PortalRunSelect (portal=0x22abc30,
forward=<optimized out>, count=0, dest=0x2429a80) at pquery.c:943
#48 0x000000000062a6a0 in PortalRun (portal=0x22abc30,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2429a80,
altdest=0x2429a80, completionTag=0x7fff14031740 "") at pquery.c:787
#49 0x000000000062698c in exec_simple_query (query_string=0x233b200
"SELECT func0003();") at postgres.c:1020
#50 PostgresMain (argc=<optimized out>, argv=<optimized out>,
username=<optimized out>) at postgres.c:3968
#51 0x00000000005eba29 in BackendRun (port=0x22b5c70) at postmaster.c:3617
#52 BackendStartup (port=0x22b5c70) at postmaster.c:3302
#53 ServerLoop () at postmaster.c:1466
#54 0x00000000005ec34c in PostmasterMain (argc=<optimized out>,
argv=0x228b540) at postmaster.c:1127
#55 0x0000000000453de0 in main (argc=1, argv=0x228b540) at main.c:199

What happens is that free_plperl_function() for some reason gets called
with the prodesc of func0003.  Later on, func0003 wants to get rid of
his prodesc and I get a crash.  What's weird about this is that
current_call_data->prodesc actually points to the correct prodesc (the
one of func0005), but still somehow something different is passed to
free_plperl_function().

Anything I can do to help further, let me know.


Regards,
Marko Tiikkaja

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
Marko Tiikkaja <pgmail@joh.to> writes:
> On 9/12/12 1:50 AM, Tom Lane wrote:
>> Marko Tiikkaja <pgmail@joh.to> writes:
>>> Joel Jacobson managed to narrow it down to this test case, which crashes
>>> consistently on Ubuntu 12.04 both with and without your patch.  I,
>>> however, wasn't able to reproduce the problem on my OS X Mountain Lion.

>> Doesn't reproduce for me either ...

> Ok, I can reproduce it on my Ubuntu virtual machine.

Hm, I wonder if it's Ubuntu-specific?  What Perl version is that exactly?

> What happens is that free_plperl_function() for some reason gets called
> with the prodesc of func0003.  Later on, func0003 wants to get rid of
> his prodesc and I get a crash.  What's weird about this is that
> current_call_data->prodesc actually points to the correct prodesc (the
> one of func0005), but still somehow something different is passed to
> free_plperl_function().

The only theory that comes to mind is that current_call_data is somehow
getting aliased (free'd and realloc'd).  I don't see how that could
happen, but it occurs to me that it's kinda dumb to be palloc'ing it
in the first place.  Its lifetime is exactly that of the call, so it
would be simpler and more foolproof to make it a local variable.

I've pushed a HEAD patch to do that, and I wonder if you could check
whether it changes anything.

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Marko Tiikkaja
Дата:
On 13/09/2012 19:48, Tom Lane wrote:
> Marko Tiikkaja <pgmail@joh.to> writes:
>> On 9/12/12 1:50 AM, Tom Lane wrote:
> Hm, I wonder if it's Ubuntu-specific?  What Perl version is that exactly?

We've reproduced it on both 5.14.2 and 5.16.1.

>> What happens is that free_plperl_function() for some reason gets called
>> with the prodesc of func0003.  Later on, func0003 wants to get rid of
>> his prodesc and I get a crash.  What's weird about this is that
>> current_call_data->prodesc actually points to the correct prodesc (the
>> one of func0005), but still somehow something different is passed to
>> free_plperl_function().
>
> The only theory that comes to mind is that current_call_data is somehow
> getting aliased (free'd and realloc'd).  I don't see how that could
> happen, but it occurs to me that it's kinda dumb to be palloc'ing it
> in the first place.  Its lifetime is exactly that of the call, so it
> would be simpler and more foolproof to make it a local variable.
>
> I've pushed a HEAD patch to do that, and I wonder if you could check
> whether it changes anything.

Will look at that tomorrow, thanks.


Regards,
Marko Tiikkaja

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
Marko Tiikkaja <pgmail@joh.to> writes:
> On 13/09/2012 19:48, Tom Lane wrote:
>> Hm, I wonder if it's Ubuntu-specific?  What Perl version is that exactly?

> We've reproduced it on both 5.14.2 and 5.16.1.

Meh.  I've managed to reproduce it on the fifth system I tried.  I don't
think it's got anything to do with the Perl version, but with the gcc
version (which is 4.7.0 on this machine).  Apparently, very recent gcc
versions are willing to throw away this line of code:

        PG_CATCH();
        {
----->          current_call_data = save_call_data;
                activate_interpreter(oldinterp);
                PG_RE_THROW();
        }
        PG_END_TRY();

        current_call_data = save_call_data;
        activate_interpreter(oldinterp);
    return retval;

Apparently the reasoning is that current_call_data is a static and
therefore the compiler can see everything that can happen to it and
therefore this store into current_call_data is dead code, since the
store six lines below will replace it.  Sigh.  I shall file a bug,
but I've found that the current crop of gcc maintainers are quite
likely to reject such reports.

We could fix the immediate problem by marking current_call_data volatile
(I've tested that this makes the problem go away on F17), but I think
what we'd better do instead is mark pg_re_throw() as noreturn.
Hopefully that will also prevent this misoptimization, as well as
similar ones that might be happening elsewhere.

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
I wrote:
> Apparently the reasoning is that current_call_data is a static and
> therefore the compiler can see everything that can happen to it and
> therefore this store into current_call_data is dead code, since the
> store six lines below will replace it.  Sigh.  I shall file a bug,
> but I've found that the current crop of gcc maintainers are quite
> likely to reject such reports.

Filed at https://bugzilla.redhat.com/show_bug.cgi?id=857236

On the good side: developing a minimal test case showed me that the code
is incorrectly compiled only if perl.h has been included.  (WTF?  No,
I don't know why.)  So at least this gcc bug should only be affecting
plperl.c and not the rest of postgres.

> We could fix the immediate problem by marking current_call_data volatile
> (I've tested that this makes the problem go away on F17), but I think
> what we'd better do instead is mark pg_re_throw() as noreturn.
> Hopefully that will also prevent this misoptimization, as well as
> similar ones that might be happening elsewhere.

But, of course, pg_re_throw() already is marked noreturn.

A probably less costly solution than marking current_call_data volatile
is just to make it not-static.

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
I wrote:
> A probably less costly solution than marking current_call_data volatile
> is just to make it not-static.

And on still further investigation, the patch I just applied to HEAD
seems to make it go away too.  Bizarre as can be.  If that holds up for
you, I think back-patching that change is less ugly than making the
variable non-static.

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Marko Tiikkaja
Дата:
On 9/13/12 11:58 PM, Tom Lane wrote:
> I wrote:
>> A probably less costly solution than marking current_call_data volatile
>> is just to make it not-static.
>
> And on still further investigation, the patch I just applied to HEAD
> seems to make it go away too.  Bizarre as can be.  If that holds up for
> you, I think back-patching that change is less ugly than making the
> variable non-static.

It does, indeed.  I can't reproduce the bug on my end with that patch
applied.



Regards,
Marko Tiikkaja

Re: BUG #7516: PL/Perl crash

От
Tom Lane
Дата:
Marko Tiikkaja <pgmail@joh.to> writes:
> On 9/13/12 11:58 PM, Tom Lane wrote:
>> And on still further investigation, the patch I just applied to HEAD
>> seems to make it go away too.  Bizarre as can be.  If that holds up for
>> you, I think back-patching that change is less ugly than making the
>> variable non-static.

> It does, indeed.  I can't reproduce the bug on my end with that patch
> applied.

Here's what Jakub Jelinek wrote in the RH bugzilla:

> That is a glibc bug.  While in setjmp.h the __sigsetjmp prototype is properly
> marked as non-leaf:
> extern int __sigsetjmp (struct __jmp_buf_tag __env[1], int __savemask)
> __THROWNL;
> in pthread.h it is incorrectly marked as leaf:
> extern int __sigsetjmp (struct __jmp_buf_tag *__env, int __savemask) __THROW;
> When pthread.h is fixed, the testcase works properly.

So that explains why including perl.h is relevant.  Would you like to
try modifying your copy of pthread.h to confirm it's the same situation
on Ubuntu?

It's probably pure luck that my no-palloc patch dodges the problem, but
anyway it seems like a good enough work-around until the glibc issue is
fixed.  I'll go ahead and back-patch that.

            regards, tom lane

Re: BUG #7516: PL/Perl crash

От
Marko Tiikkaja
Дата:
On 9/14/12 4:26 PM, Tom Lane wrote:
> So that explains why including perl.h is relevant.  Would you like to
> try modifying your copy of pthread.h to confirm it's the same situation
> on Ubuntu?

Yup, that fixes it.


Regards,
Marko Tiikkaja