Обсуждение: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

Поиск
Список
Период
Сортировка

Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

От
Rushabh Lathia
Дата:
Hi,

Commit b04aeb0a053e7cf7faad89f7d47844d8ba0dc839 add assertions that we hold
some relevant lock during relation open as opening a relation with no lock
at all is unsafe;

With above commit below test case is failing and hitting the newly added
Assert().

Test case:
===========

CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);

CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
 DELETE FROM foo where x = a;
 return 0;
END;
$$ LANGUAGE plpgsql;

postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
 f1
----
  0
(1 row)

postgres@100858=#select f1(4);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


Call stack:
=============

#0  0x00007f230b2d61d7 in raise () from /lib64/libc.so.6
#1  0x00007f230b2d78c8 in abort () from /lib64/libc.so.6
#2  0x0000000000a26ce5 in ExceptionalCondition (
    conditionName=0xabdca8 "!(lockmode != 0 || (Mode == BootstrapProcessing) || CheckRelationLockedByMe(r, 1, 1))", errorType=0xabc529 "FailedAssertion",
    fileName=0xabc668 "heapam.c", lineNumber=1146) at assert.c:54
#3  0x00000000004dacd8 in relation_open (relationId=16387, lockmode=0) at heapam.c:1144
#4  0x00000000004f94bf in index_open (relationId=16387, lockmode=0) at indexam.c:155
#5  0x0000000000701403 in ExecInitIndexScan (node=0x1cf8fd8, estate=0x1cec418, eflags=0) at nodeIndexscan.c:994
#6  0x00000000006dde31 in ExecInitNode (node=0x1cf8fd8, estate=0x1cec418, eflags=0) at execProcnode.c:217
#7  0x000000000070d7b8 in ExecInitModifyTable (node=0x1cf8da8, estate=0x1cec418, eflags=0) at nodeModifyTable.c:2190
#8  0x00000000006ddd39 in ExecInitNode (node=0x1cf8da8, estate=0x1cec418, eflags=0) at execProcnode.c:174
#9  0x00000000006d48a5 in InitPlan (queryDesc=0x1cdc378, eflags=0) at execMain.c:1024
#10 0x00000000006d36e9 in standard_ExecutorStart (queryDesc=0x1cdc378, eflags=0) at execMain.c:265
#11 0x00000000006d3485 in ExecutorStart (queryDesc=0x1cdc378, eflags=0) at execMain.c:147
#12 0x0000000000725880 in _SPI_pquery (queryDesc=0x1cdc378, fire_triggers=true, tcount=0) at spi.c:2469
#13 0x00000000007252d4 in _SPI_execute_plan (plan=0x1cc4748, paramLI=0x1ce23c8, snapshot=0x0, crosscheck_snapshot=0x0, read_only=false, fire_triggers=true,
    tcount=0) at spi.c:2235
#14 0x00000000007222a1 in SPI_execute_plan_with_paramlist (plan=0x1cc4748, params=0x1ce23c8, read_only=false, tcount=0) at spi.c:516
#15 0x00007f23004a5f9a in exec_stmt_execsql (estate=0x7fff0bcaa4d0, stmt=0x1cc78f0) at pl_exec.c:4114

Here DELETE statement pick the IndexScan plan, and later during execution
when it tries to open index relation, ExecInitIndexScan() assumes that index
relation should already hold require a lock.  But in this case, that's not the
case - which got exposed through the newly added ASSERT() for NoLock.

When query get's execute first time build_simple_rel() -> get_relation_info(),
takes a lock on the indexes of a relation (this do happen from the
BuildCachePlan()).  When we execute the same query second time, a plan is caches
and CheckCachePlan() calls the AquireExecutorLocks() take the require locks
needed for execution of the caches plan.  Here it takes a lock on the rtable
list - but doesn't take any lock on the index relation.

I observed that ExecInitModifyTable() to open the index relation if indices on
the result relation. But here also - it doesn't do anything for CMD_DELETE:

        /*
         * If there are indices on the result relation, open them and save
         * descriptors in the result relation info, so that we can add new
         * index entries for the tuples we add/update.  We need not do this
         * for a DELETE, however, since deletion doesn't affect indexes. Also,
         * inside an EvalPlanQual operation, the indexes might be open
         * already, since we share the resultrel state with the original
         * query.
         */
        if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
            operation != CMD_DELETE &&
            resultRelInfo->ri_IndexRelationDescs == NULL)
            ExecOpenIndices(resultRelInfo,
                            node->onConflictAction != ONCONFLICT_NONE);

Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
on the scan coming from CMD_DELETE. 

Thoughts/Comments?

Thanks,
Rushabh Lathia

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

От
David Rowley
Дата:
On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> CREATE TABLE foo (x int primary key);
> INSERT INTO foo VALUES (1), (2), (3), (4), (5);
>
> CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
> AS $$
> BEGIN
>  DELETE FROM foo where x = a;
>  return 0;
> END;
> $$ LANGUAGE plpgsql;
>
> postgres@100858=#set plan_cache_mode = force_generic_plan;
> SET
> postgres@100858=#select f1(4);
>  f1
> ----
>   0
> (1 row)
>
> postgres@100858=#select f1(4);
> server closed the connection unexpectedly


> Now, to fix this issue either we need to hold proper lock before reaching
> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
> on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here.  Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
  relistarget ? NoLock : AccessShareLock);

Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.

I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.

Anyway. I've attached a small patch to update the above fragment.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
>> Now, to fix this issue either we need to hold proper lock before reaching
>> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
>> on the scan coming from CMD_DELETE.

> I'd say the following comment and code in nodeIndexscan.c is outdated:

I complained about this previously in

https://www.postgresql.org/message-id/19465.1541636036@sss.pgh.pa.us

but got nothing but crickets :-(

            regards, tom lane


Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

От
Rushabh Lathia
Дата:


On Fri, Nov 23, 2018 at 3:33 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> CREATE TABLE foo (x int primary key);
> INSERT INTO foo VALUES (1), (2), (3), (4), (5);
>
> CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
> AS $$
> BEGIN
>  DELETE FROM foo where x = a;
>  return 0;
> END;
> $$ LANGUAGE plpgsql;
>
> postgres@100858=#set plan_cache_mode = force_generic_plan;
> SET
> postgres@100858=#select f1(4);
>  f1
> ----
>   0
> (1 row)
>
> postgres@100858=#select f1(4);
> server closed the connection unexpectedly


> Now, to fix this issue either we need to hold proper lock before reaching
> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
> on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here.  Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
  relistarget ? NoLock : AccessShareLock);

Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.


+1.

I tried the same approach and with further testing I haven't found
any issues.
 
I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.

Anyway. I've attached a small patch to update the above fragment.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Rushabh Lathia

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

От
Rushabh Lathia
Дата:
While looking code further around this, I realized that we need
similar kind of fix for bitmap as well as index only scan as well.

Here is the patch, which does similar fix for bitmap and indexonly
scans.

Thanks,


On Fri, Nov 23, 2018 at 6:47 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Fri, Nov 23, 2018 at 3:33 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> CREATE TABLE foo (x int primary key);
> INSERT INTO foo VALUES (1), (2), (3), (4), (5);
>
> CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
> AS $$
> BEGIN
>  DELETE FROM foo where x = a;
>  return 0;
> END;
> $$ LANGUAGE plpgsql;
>
> postgres@100858=#set plan_cache_mode = force_generic_plan;
> SET
> postgres@100858=#select f1(4);
>  f1
> ----
>   0
> (1 row)
>
> postgres@100858=#select f1(4);
> server closed the connection unexpectedly


> Now, to fix this issue either we need to hold proper lock before reaching
> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
> on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here.  Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
  relistarget ? NoLock : AccessShareLock);

Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.


+1.

I tried the same approach and with further testing I haven't found
any issues.
 
I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.

Anyway. I've attached a small patch to update the above fragment.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Rushabh Lathia


--
Rushabh Lathia
Вложения

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

От
Amit Kapila
Дата:
On Fri, Dec 14, 2018 at 11:32 AM Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
>
> While looking code further around this, I realized that we need
> similar kind of fix for bitmap as well as index only scan as well.
>

I think if we want to move forward with this patch, we need to first
investigate the deadlock hazard mentioned by Tom in other related
thread [1].   I and others have also responded on that thread, see if
you can respond to it.

[1] - https://www.postgresql.org/message-id/3046.1542990312%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com