Обсуждение: Executor's internal ParamExecData Params versus EvalPlanQual

Поиск
Список
Период
Сортировка

Executor's internal ParamExecData Params versus EvalPlanQual

От
Tom Lane
Дата:
I looked into the misbehavior reported in bug #14328,
https://www.postgresql.org/message-id/20160919160136.1348.55251@wrigleys.postgresql.org

What is happening is that during the EvalPlanQual recheck to see whether
the updated row still satisfies the SELECT's quals, we run ExecInitCteScan
and then CteScanNext in the EPQ-created copy of the plan tree.  This
should result in a fresh scan of the underlying CTE ... but it does not,
because ExecInitCteScan sees the communication value
estate->es_param_exec_vals[node->cteParam] already set and decides it is
not a leader node, but rather a follower of the outer-query CTE Scan node
that it's a doppelganger of.  That one's already at EOF so the
node->leader->eof_cte test fails in CteScanNext and it returns nothing.

The reason this happens is that EvalPlanQualBegin copies down all the
es_param_exec_vals values from the outer query into the EPQ execution
state.  That's OK for most uses of es_param_exec_vals values, but not
at all for cteParam Params, which are used as internal communication
variables within a plan tree.

I believe that recursive unions are probably broken in EPQ rechecks in the
same way, since they use a Param for similar intra-plan communication
purposes, but haven't bothered to devise a test case to prove it.

I think the most robust way to fix this is to explicitly mark
es_param_exec_vals entries as to whether they should be copied down to
the EPQ execution state.  There is room in struct ParamExecData to fit
a uint8 or uint16 field into existing pad space, so we could add a flags
field and then use one bit of it for this purpose without causing any
ABI breaks in the back branches.

A simpler fix would be to decide that we never need to copy any
es_param_exec_vals entries into the EPQ execution state.  I think that
that would work, but it would provoke extra evaluations of InitPlan
subplans, and I'm hesitant to make such a significant behavioral change
without a lot more analysis.

Comments?
        regards, tom lane



Re: Executor's internal ParamExecData Params versus EvalPlanQual

От
Tom Lane
Дата:
I wrote:
> The reason this happens is that EvalPlanQualBegin copies down all the
> es_param_exec_vals values from the outer query into the EPQ execution
> state.  That's OK for most uses of es_param_exec_vals values, but not
> at all for cteParam Params, which are used as internal communication
> variables within a plan tree.

After further study and caffeine-consumption, I concluded that that's
wrong.  There's nothing wrong with reusing the result of a CTE that has
already been scanned by the original query invocation (its output can't
be affected by the new candidate-for-update row).  The problem is
that ExecInitCteScan is assuming that when it initializes as a follower,
the tuplestore read pointer it's given by tuplestore_alloc_read_pointer
will be pointing to the start of the tuplestore.  That's wrong; the new
read pointer is defined as being a clone of read pointer 0, which is
already at EOF in this scenario.  So a simple and localized fix is to
forcibly rewind the new read pointer, as in the attached patch.  (This
should have negligible cost as long as the tuplestore hasn't yet spilled
to disk.)

I also considered redefining tuplestore_alloc_read_pointer as creating
a read pointer that points to start-of-tuplestore.  The other existing
callers wouldn't care, because they are working with a just-created,
known-empty tuplestore.  But there is a risk of breaking third-party
code, so this didn't seem like a back-patchable solution.  Also, I think
the reason why tuplestore_alloc_read_pointer was given this behavior is
so that it could succeed even if the tuplestore has already been moved
forward and perhaps had old data truncated off it.  With the other
behavior, it would have to have the same failure cases as
tuplestore_rescan.

BTW, I no longer think the recursive-union case is broken; rather, failure
to copy its communication Param would break it, in scenarios where a
WorkTableScan is underneath a SELECT FOR UPDATE that's underneath the
RecursiveUnion.  So that's another reason not to mess with the Param
propagation logic.

            regards, tom lane

diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index 3c2f684..162650a 100644
*** a/src/backend/executor/nodeCtescan.c
--- b/src/backend/executor/nodeCtescan.c
*************** ExecInitCteScan(CteScan *node, EState *e
*** 224,232 ****
--- 224,236 ----
      {
          /* Not the leader */
          Assert(IsA(scanstate->leader, CteScanState));
+         /* Create my own read pointer, and ensure it is at start */
          scanstate->readptr =
              tuplestore_alloc_read_pointer(scanstate->leader->cte_table,
                                            scanstate->eflags);
+         tuplestore_select_read_pointer(scanstate->leader->cte_table,
+                                        scanstate->readptr);
+         tuplestore_rescan(scanstate->leader->cte_table);
      }

      /*
diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 5898d94..10c784a 100644
*** a/src/test/isolation/expected/eval-plan-qual.out
--- b/src/test/isolation/expected/eval-plan-qual.out
*************** ta_id          ta_value       tb_row
*** 163,165 ****
--- 163,186 ----

  1              newTableAValue (1,tableBValue)
  step c2: COMMIT;
+
+ starting permutation: wrtwcte readwcte c1 c2
+ step wrtwcte: UPDATE table_a SET value = 'tableAValue2' WHERE id = 1;
+ step readwcte:
+     WITH
+         cte1 AS (
+           SELECT id FROM table_b WHERE value = 'tableBValue'
+         ),
+         cte2 AS (
+           SELECT * FROM table_a
+           WHERE id = (SELECT id FROM cte1)
+           FOR UPDATE
+         )
+     SELECT * FROM cte2;
+  <waiting ...>
+ step c1: COMMIT;
+ step c2: COMMIT;
+ step readwcte: <... completed>
+ id             value
+
+ 1              tableAValue2
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index de481a3..7ff6f6b 100644
*** a/src/test/isolation/specs/eval-plan-qual.spec
--- b/src/test/isolation/specs/eval-plan-qual.spec
*************** step "readforss"    {
*** 103,113 ****
--- 103,129 ----
      FROM table_a ta
      WHERE ta.id = 1 FOR UPDATE OF ta;
  }
+ step "wrtwcte"    { UPDATE table_a SET value = 'tableAValue2' WHERE id = 1; }
  step "c2"    { COMMIT; }

  session "s3"
  setup        { BEGIN ISOLATION LEVEL READ COMMITTED; }
  step "read"    { SELECT * FROM accounts ORDER BY accountid; }
+
+ # this test exercises EvalPlanQual with a CTE, cf bug #14328
+ step "readwcte"    {
+     WITH
+         cte1 AS (
+           SELECT id FROM table_b WHERE value = 'tableBValue'
+         ),
+         cte2 AS (
+           SELECT * FROM table_a
+           WHERE id = (SELECT id FROM cte1)
+           FOR UPDATE
+         )
+     SELECT * FROM cte2;
+ }
+
  teardown    { COMMIT; }

  permutation "wx1" "wx2" "c1" "c2" "read"
*************** permutation "writep2" "returningp1" "c1"
*** 118,120 ****
--- 134,137 ----
  permutation "wx2" "partiallock" "c2" "c1" "read"
  permutation "wx2" "lockwithvalues" "c2" "c1" "read"
  permutation "updateforss" "readforss" "c1" "c2"
+ permutation "wrtwcte" "readwcte" "c1" "c2"