Обсуждение: Inconsistency in plpgsql's error context reports

Поиск
Список
Период
Сортировка

Inconsistency in plpgsql's error context reports

От
Tom Lane
Дата:
I noticed that since I put in commit 390d58135, buildfarm members
that use CLOBBER_CACHE_ALWAYS are failing the added test case
with diffs like

*** 5051,5057 ****
  NOTICE:  y = 2
  ERROR:  record "r" is not assigned yet
  DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.
! CONTEXT:  PL/pgSQL function inline_code_block line 15 at RAISE
  -- Check handling of conflicts between plpgsql vars and table columns.
  set plpgsql.variable_conflict = error;
  create function conflict_test() returns setof int8_tbl as $$
--- 5051,5058 ----
  NOTICE:  y = 2
  ERROR:  record "r" is not assigned yet
  DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.
! CONTEXT:  SQL statement "SELECT r"
! PL/pgSQL function inline_code_block line 15 at RAISE
  -- Check handling of conflicts between plpgsql vars and table columns.
  set plpgsql.variable_conflict = error;
  create function conflict_test() returns setof int8_tbl as $$

The reason for the difference turns out to be that in a CCA build,
we are forcing recreation of the cached plan for the simple expression
"r", so the error gets detected during parse analysis of "SELECT r"
rather than during execution of the expression.  And because this all
mostly relies on SPI, we've gone through SPI_plan_get_cached_plan,
which inserts _SPI_error_callback into the error context stack.
That's where the first CONTEXT line is coming from.

There seem to be two ways we could look at this.  One is that the
new test case just needs to be rejiggered to avoid unstable output
("\set VERBOSITY terse" would be the easiest way).  But there is
also room to argue that it's bad that plpgsql produces error reports
that vary depending on the phase of the moon, which is pretty much
what this would look like in the field --- cache flushes will occur
unpredictably in most application environments.  In that view, where
exec_eval_simple_expr() bypasses SPI but nonetheless "has to do some of
the things SPI_execute_plan would do", one of the things it ought to be
doing is setting up an error context stack entry to duplicate the one
that SPI_execute_plan would push.

I'm of mixed mind about whether this is a good idea.  Adding more overhead
to exec_eval_simple_expr() isn't great, even though it'd be just a few
instructions.  And in most cases the statement-level context line that
you get anyway ought to be sufficient to localize the problem.  But it's
not too hard to imagine that a CONTEXT line that only shows up some of
the time could break somebody's error handling code.  It's also arguable
that it's weird that you get different CONTEXT reports depending on
whether exec_simple_check_plan thinks the expression is simple or not,
eg

regression=# do $$ declare x int := 1;
begin raise notice '%', x/0;         
end $$;
ERROR:  division by zero
CONTEXT:  PL/pgSQL function inline_code_block line 2 at RAISE

regression=# do $$ declare x int := 1;
begin raise notice '%', (select x/0);
end $$;
ERROR:  division by zero
CONTEXT:  SQL statement "SELECT (select x/0)"
PL/pgSQL function inline_code_block line 2 at RAISE

Thoughts?

            regards, tom lane


Re: Inconsistency in plpgsql's error context reports

От
Robert Haas
Дата:
On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There seem to be two ways we could look at this.  One is that the
> new test case just needs to be rejiggered to avoid unstable output
> ("\set VERBOSITY terse" would be the easiest way).  But there is
> also room to argue that it's bad that plpgsql produces error reports
> that vary depending on the phase of the moon, which is pretty much
> what this would look like in the field --- cache flushes will occur
> unpredictably in most application environments.

I am inclined toward the latter view.  I think predictability is very
important and that lack of predictability is both frustrating for
users and a sign of poor programming practices under the hood.  Nobody
likes clicking on the same thing twice and getting a different result
each time; as humans, we want to feel like we understand how things
work, and if they work differently at different times, we tend to
either question whether we understand them or question whether they
are actually robust systems.  For example, lately my car has started
to take a variable number of gallons of gasoline to fill the tank from
almost-empty to completely full.  I find that rather unnerving: how do
I know whether I'm going to run out of gas?  Similarly here -- I'm not
sure whether people want that extra CONTEXT line or not, but I think
they're going to want to either get it or not get it consistently.  If
not, they'll tend to feel like something is flaky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Inconsistency in plpgsql's error context reports

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There seem to be two ways we could look at this.  One is that the
>> new test case just needs to be rejiggered to avoid unstable output
>> ("\set VERBOSITY terse" would be the easiest way).  But there is
>> also room to argue that it's bad that plpgsql produces error reports
>> that vary depending on the phase of the moon, which is pretty much
>> what this would look like in the field --- cache flushes will occur
>> unpredictably in most application environments.

> I am inclined toward the latter view.

Yeah, me too.  I'll see about patching exec_eval_simple_expr() to
provide a context line that matches SPI's.  Seems like a HEAD-only
change though, as this will result in visible behavior change in
the typical case.

            regards, tom lane


Re: Inconsistency in plpgsql's error context reports

От
Tom Lane
Дата:
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> There seem to be two ways we could look at this.  One is that the
>>> new test case just needs to be rejiggered to avoid unstable output
>>> ("\set VERBOSITY terse" would be the easiest way).  But there is
>>> also room to argue that it's bad that plpgsql produces error reports
>>> that vary depending on the phase of the moon, which is pretty much
>>> what this would look like in the field --- cache flushes will occur
>>> unpredictably in most application environments.

>> I am inclined toward the latter view.

> Yeah, me too.  I'll see about patching exec_eval_simple_expr() to
> provide a context line that matches SPI's.  Seems like a HEAD-only
> change though, as this will result in visible behavior change in
> the typical case.

Here's a quick hack at that.  I guess the main question that needs to be
asked is whether we're happy with plpgsql getting so much chattier
(as per all the regression test changes).

If we're not, the alternative that could be considered is to make SPI
expose some way to suppress its use of a context callback, and change
plpgsql to invoke that when dealing with an expression.  That would
be rather more invasive code-wise, but would largely eliminate the
behavioral change as seen by users.

Another angle, if we do keep it like this, is that maybe SPI should
export _SPI_error_callback so that plpgsql can use it directly,
rather than having a copy that needs to be kept in sync.

            regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index fa4d573..a7c399b 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*************** static int exec_run_select(PLpgSQL_execs
*** 266,271 ****
--- 266,272 ----
                  PLpgSQL_expr *expr, long maxtuples, Portal *portalP);
  static int exec_for_query(PLpgSQL_execstate *estate, PLpgSQL_stmt_forq *stmt,
                 Portal portal, bool prefetch_ok);
+ static void simple_expr_error_callback(void *arg);
  static ParamListInfo setup_param_list(PLpgSQL_execstate *estate,
                   PLpgSQL_expr *expr);
  static ParamListInfo setup_unshared_param_list(PLpgSQL_execstate *estate,
*************** exec_eval_simple_expr(PLpgSQL_execstate
*** 5504,5509 ****
--- 5505,5511 ----
      ExprContext *econtext = estate->eval_econtext;
      LocalTransactionId curlxid = MyProc->lxid;
      CachedPlan *cplan;
+     ErrorContextCallback simpleerrcontext;
      ParamListInfo paramLI;
      void       *save_setup_arg;
      MemoryContext oldcontext;
*************** exec_eval_simple_expr(PLpgSQL_execstate
*** 5579,5584 ****
--- 5581,5594 ----
      }

      /*
+      * Also setup error traceback support for ereport()
+      */
+     simpleerrcontext.callback = simple_expr_error_callback;
+     simpleerrcontext.arg = expr;
+     simpleerrcontext.previous = error_context_stack;
+     error_context_stack = &simpleerrcontext;
+
+     /*
       * Set up ParamListInfo to pass to executor.  We need an unshared list if
       * it's going to include any R/W expanded-object pointer.  For safety,
       * save and restore estate->paramLI->parserSetupArg around our use of the
*************** exec_eval_simple_expr(PLpgSQL_execstate
*** 5612,5617 ****
--- 5622,5629 ----

      estate->paramLI->parserSetupArg = save_setup_arg;

+     error_context_stack = simpleerrcontext.previous;
+
      if (!estate->readonly_func)
          PopActiveSnapshot();

*************** exec_eval_simple_expr(PLpgSQL_execstate
*** 5628,5633 ****
--- 5640,5674 ----
      return true;
  }

+ /*
+  * simple_expr_error_callback
+  *
+  * Add context information when a simple expression fails.  This should
+  * match the behavior of spi.c's _SPI_error_callback, else users will see
+  * different behavior for errors in simple and not-simple expressions.
+  */
+ static void
+ simple_expr_error_callback(void *arg)
+ {
+     PLpgSQL_expr *expr = (PLpgSQL_expr *) arg;
+     const char *query = expr->query;
+     int            syntaxerrposition;
+
+     /*
+      * If there is a syntax error position, convert to internal syntax error;
+      * otherwise treat the query as an item of context stack
+      */
+     syntaxerrposition = geterrposition();
+     if (syntaxerrposition > 0)
+     {
+         errposition(0);
+         internalerrposition(syntaxerrposition);
+         internalerrquery(query);
+     }
+     else
+         errcontext("SQL statement \"%s\"", query);
+ }
+

  /*
   * Create a ParamListInfo to pass to SPI
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 26f6e43..4a384b0 100644
*** a/src/test/regress/expected/plpgsql.out
--- b/src/test/regress/expected/plpgsql.out
*************** DETAIL:  Key (name)=(PF1_1) already exis
*** 1519,1536 ****
--- 1519,1540 ----
  update PSlot set backlink = 'WS.not.there' where slotname = 'PS.base.a1';
  ERROR:  WS.not.there         does not exist
  CONTEXT:  PL/pgSQL function tg_backlink_set(character,character) line 30 at RAISE
+ SQL statement "SELECT tg_backlink_set(new.backlink, new.slotname)"
  PL/pgSQL function tg_backlink_a() line 17 at assignment
  update PSlot set backlink = 'XX.illegal' where slotname = 'PS.base.a1';
  ERROR:  illegal backlink beginning with XX
  CONTEXT:  PL/pgSQL function tg_backlink_set(character,character) line 47 at RAISE
+ SQL statement "SELECT tg_backlink_set(new.backlink, new.slotname)"
  PL/pgSQL function tg_backlink_a() line 17 at assignment
  update PSlot set slotlink = 'PS.not.there' where slotname = 'PS.base.a1';
  ERROR:  PS.not.there         does not exist
  CONTEXT:  PL/pgSQL function tg_slotlink_set(character,character) line 30 at RAISE
+ SQL statement "SELECT tg_slotlink_set(new.slotlink, new.slotname)"
  PL/pgSQL function tg_slotlink_a() line 17 at assignment
  update PSlot set slotlink = 'XX.illegal' where slotname = 'PS.base.a1';
  ERROR:  illegal slotlink beginning with XX
  CONTEXT:  PL/pgSQL function tg_slotlink_set(character,character) line 77 at RAISE
+ SQL statement "SELECT tg_slotlink_set(new.slotlink, new.slotname)"
  PL/pgSQL function tg_slotlink_a() line 17 at assignment
  insert into HSlot values ('HS', 'base.hub1', 1, '');
  ERROR:  duplicate key value violates unique constraint "hslot_name"
*************** exception when others then
*** 4273,4279 ****
  end;
  $$ language plpgsql;
  select stacked_diagnostics_test();
! NOTICE:  sqlstate: 22012, message: division by zero, context: [PL/pgSQL function zero_divide() line 4 at RETURN <-
SQLstatement "SELECT zero_divide()" <- PL/pgSQL function stacked_diagnostics_test() line 6 at PERFORM] 
   stacked_diagnostics_test
  --------------------------

--- 4277,4283 ----
  end;
  $$ language plpgsql;
  select stacked_diagnostics_test();
! NOTICE:  sqlstate: 22012, message: division by zero, context: [SQL statement "SELECT 10 / v" <- PL/pgSQL function
zero_divide()line 4 at RETURN <- SQL statement "SELECT zero_divide()" <- PL/pgSQL function stacked_diagnostics_test()
line6 at PERFORM] 
   stacked_diagnostics_test
  --------------------------

*************** SAVEPOINT a;
*** 4772,4777 ****
--- 4776,4782 ----
  select error2('nonexistent.stuffs');
  ERROR:  schema "nonexistent" does not exist
  CONTEXT:  SQL function "error1" statement 1
+ SQL statement "SELECT error1(p_name_table)"
  PL/pgSQL function error2(text) line 3 at RETURN
  ROLLBACK TO a;
  select error2('public.stuffs');
*************** select scope_test();
*** 5026,5032 ****

  drop function scope_test();
  -- Check that variables are reinitialized on block re-entry.
- \set VERBOSITY terse   \\ -- needed for output stability
  do $$
  begin
    for i in 1..3 loop
--- 5031,5036 ----
*************** NOTICE:  r = (1,2)
*** 5051,5057 ****
  NOTICE:  x = <NULL>
  NOTICE:  y = 2
  ERROR:  record "r" is not assigned yet
! \set VERBOSITY default
  -- Check handling of conflicts between plpgsql vars and table columns.
  set plpgsql.variable_conflict = error;
  create function conflict_test() returns setof int8_tbl as $$
--- 5055,5063 ----
  NOTICE:  x = <NULL>
  NOTICE:  y = 2
  ERROR:  record "r" is not assigned yet
! DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.
! CONTEXT:  SQL statement "SELECT r"
! PL/pgSQL function inline_code_block line 15 at RAISE
  -- Check handling of conflicts between plpgsql vars and table columns.
  set plpgsql.variable_conflict = error;
  create function conflict_test() returns setof int8_tbl as $$
*************** select outer_outer_func(10);
*** 5507,5516 ****
--- 5513,5526 ----
  NOTICE:  calling down into outer_func()
  NOTICE:  calling down into inner_func()
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 4 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 7 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  lets make sure we didnt break anything
  NOTICE:  inner_func() done
*************** select outer_outer_func(20);
*** 5525,5534 ****
--- 5535,5548 ----
  NOTICE:  calling down into outer_func()
  NOTICE:  calling down into inner_func()
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 4 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 7 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  lets make sure we didnt break anything
  NOTICE:  inner_func() done
*************** select outer_outer_func(10);
*** 5589,5598 ****
--- 5603,5616 ----
  NOTICE:  calling down into outer_func()
  NOTICE:  calling down into inner_func()
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 10 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 15 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  lets make sure we didnt break anything
  NOTICE:  inner_func() done
*************** select outer_outer_func(20);
*** 5607,5616 ****
--- 5625,5638 ----
  NOTICE:  calling down into outer_func()
  NOTICE:  calling down into inner_func()
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 10 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  ***PL/pgSQL function inner_func(integer) line 15 at GET DIAGNOSTICS
+ SQL statement "SELECT inner_func($1)"
  PL/pgSQL function outer_func(integer) line 6 at assignment
+ SQL statement "SELECT outer_func($1)"
  PL/pgSQL function outer_outer_func(integer) line 6 at assignment***
  NOTICE:  lets make sure we didnt break anything
  NOTICE:  inner_func() done
diff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out
index a7fdcf4..9e30ea2 100644
*** a/src/test/regress/expected/transactions.out
--- b/src/test/regress/expected/transactions.out
*************** FETCH ok;  -- should work
*** 644,650 ****

  FETCH ctt; -- error occurs here
  ERROR:  division by zero
! CONTEXT:  PL/pgSQL function invert(double precision) line 1 at RETURN
  SQL statement "INSERT INTO new_table SELECT invert(0.0)"
  PL/pgSQL function create_temp_tab() line 6 at SQL statement
  ROLLBACK TO s1;
--- 644,651 ----

  FETCH ctt; -- error occurs here
  ERROR:  division by zero
! CONTEXT:  SQL statement "SELECT 1/x"
! PL/pgSQL function invert(double precision) line 1 at RETURN
  SQL statement "INSERT INTO new_table SELECT invert(0.0)"
  PL/pgSQL function create_temp_tab() line 6 at SQL statement
  ROLLBACK TO s1;
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index bb09b2d..07b6fc8 100644
*** a/src/test/regress/sql/plpgsql.sql
--- b/src/test/regress/sql/plpgsql.sql
*************** drop function scope_test();
*** 4016,4022 ****

  -- Check that variables are reinitialized on block re-entry.

- \set VERBOSITY terse   \\ -- needed for output stability
  do $$
  begin
    for i in 1..3 loop
--- 4016,4021 ----
*************** begin
*** 4035,4041 ****
      end;
    end loop;
  end$$;
- \set VERBOSITY default

  -- Check handling of conflicts between plpgsql vars and table columns.

--- 4034,4039 ----

Re: Inconsistency in plpgsql's error context reports

От
Robert Haas
Дата:
On Mon, Dec 11, 2017 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Here's a quick hack at that.  I guess the main question that needs to be
> asked is whether we're happy with plpgsql getting so much chattier
> (as per all the regression test changes).
>
> If we're not, the alternative that could be considered is to make SPI
> expose some way to suppress its use of a context callback, and change
> plpgsql to invoke that when dealing with an expression.  That would
> be rather more invasive code-wise, but would largely eliminate the
> behavioral change as seen by users.
>
> Another angle, if we do keep it like this, is that maybe SPI should
> export _SPI_error_callback so that plpgsql can use it directly,
> rather than having a copy that needs to be kept in sync.

I confess to never having really grokked, even in the pre-patch state,
why we sometimes get an "SQL statement" context line and sometimes
not.  However, what strikes me about this is that the SQL statement is
a completely fabricated one that the user never entered.  Consider
this bit from the existing regression test output, unpatched:

create function namedparmcursor_test7() returns void as $$
declare
  c1 cursor (p1 int, p2 int) for
    select * from tenk1 where thousand = p1 and tenthous = p2;
begin
  open c1 (p2 := 77, p1 := 42/0);
end $$ language plpgsql;
select namedparmcursor_test7();
ERROR:  division by zero
CONTEXT:  SQL statement "SELECT 42/0 AS p1, 77 AS p2;"
PL/pgSQL function namedparmcursor_test7() line 6 at OPEN

Quite obviously, nothing like "SELECT 42/0 AS p1, 77 AS p2;" is
present in there anywhere.  When people see an SQL statement in the
context, or at least when I see it, my inclination is to go grep for
where that SQL statement is to be found, and to be unhappy when the
answer is nowhere.  It would really be a lot better if we could say
something like

CONTEXT: SQL expression 42/0

...but I realize that's probably hard to do.  However, the current
situation is that plpgsql.out contains 5 "SQL statement" context
lines, of which only 1 is an SQL statement that actually appears in
the procedure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Inconsistency in plpgsql's error context reports

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Dec 11, 2017 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Here's a quick hack at that.  I guess the main question that needs to be
>> asked is whether we're happy with plpgsql getting so much chattier
>> (as per all the regression test changes).

> I confess to never having really grokked, even in the pre-patch state,
> why we sometimes get an "SQL statement" context line and sometimes
> not.  However, what strikes me about this is that the SQL statement is
> a completely fabricated one that the user never entered.

I spent a brief amount of time looking at whether we could hide the
"fabricated statements" more thoroughly, and decided that it would
be a lot more work than seems warranted, and there would likely be
holes in it anyway.  It's not like we don't document this behavior:
the plpgsql introductory material says

    All expressions used in PL/pgSQL statements are processed using
    the server's main SQL executor. For example, when you write a
    PL/pgSQL statement like

        IF expression THEN ...

    PL/pgSQL will evaluate the expression by feeding a query like

        SELECT expression

    to the main SQL engine.

https://www.postgresql.org/docs/devel/static/plpgsql-expressions.html

So basically my conclusion at this point is that I'm not willing
to do any more work on this than the patch I already proposed.
If you want to do something more invasive, have at it.

            regards, tom lane