Re: Declarative partitioning - another take

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: Declarative partitioning - another take
Дата
Msg-id 78ae9888-7372-245d-cf6b-f655e6fe6c12@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Declarative partitioning - another take  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Declarative partitioning - another take  (Robert Haas <robertmhaas@gmail.com>)
Re: Declarative partitioning - another take  (Erik Rijkers <er@xs4all.nl>)
Список pgsql-hackers
On 2016/12/07 13:38, Robert Haas wrote:
> On Wed, Nov 30, 2016 at 10:56 AM, Amit Langote <amitlangote09@gmail.com> wrote:
>> The latest patch I posted earlier today has this implementation.
>
> I decided to try out these patches today with #define
> CLOBBER_CACHE_ALWAYS 1 in pg_config_manual.h, which found a couple of
> problems:
>
> 1. RelationClearRelation() wasn't preserving the rd_partkey, even
> though there's plenty of code that relies on it not changing while we
> hold a lock on the relation - in particular, transformPartitionBound.

Oh, I thought an AccessExclusiveLock on the relation would prevent having
to worry about that, but guess I'm wrong.  Perhaps, having a lock on a
table does not preclude RelationClearRelation() resetting the table's
relcache.

> 2. partition_bounds_equal() was using the comparator and collation for
> partitioning column 0 to compare the datums for all partitioning
> columns.  It's amazing this passed the regression tests.

Oops, it seems that the regression tests where the above code might be
exercised consisted only of range partition key with columns all of the
same type: create table test(a int, b int) partition by range (a, (a+b));

> The attached incremental patch fixes those things and some cosmetic
> issues I found along the way.

Thanks for fixing these.  Attached patches include these changes.

> 3. RelationGetPartitionDispatchInfo() is badly broken:
>
> 1010         pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
> 1011         pd[i]->relid = RelationGetRelid(partrel);
> 1012         pd[i]->key = RelationGetPartitionKey(partrel);
> 1013         pd[i]->keystate = NIL;
> 1014         pd[i]->partdesc = partdesc;
> 1015         pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
> 1016         heap_close(partrel, NoLock);
> 1017
> 1018         m = 0;
> 1019         for (j = 0; j < partdesc->nparts; j++)
> 1020         {
> 1021             Oid     partrelid = partdesc->oids[j];
>
> This code imagines that pointers it extracted from partrel are certain
> to remain valid after heap_close(partrel, NoLock), perhaps on the
> strength of the fact that we still retain a lock on the relation.  But
> this isn't the case.  As soon as nobody has the relation open, a call
> to RelationClearRelation() will destroy the relcache entry and
> everything to which it points; with CLOBBER_CACHE_ALWAYS, I see a
> failure at line 1021:
>
> #0  RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
> leaf_part_oids=0x7fff5633b938) at partition.c:1021
> 1021                Oid        partrelid = partdesc->oids[j];
> (gdb) bt 5
> #0  RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
> leaf_part_oids=0x7fff5633b938) at partition.c:1021
> #1  0x0000000109b8d71f in ExecInitModifyTable (node=0x7fd12984d750,
> estate=0x7fd12b885438, eflags=0) at nodeModifyTable.c:1730
> #2  0x0000000109b5e7ac in ExecInitNode (node=0x7fd12984d750,
> estate=0x7fd12b885438, eflags=0) at execProcnode.c:159
> #3  0x0000000109b58548 in InitPlan (queryDesc=0x7fd12b87b638,
> eflags=0) at execMain.c:961
> #4  0x0000000109b57dcd in standard_ExecutorStart
> (queryDesc=0x7fd12b87b638, eflags=0) at execMain.c:239
> (More stack frames follow...)
> Current language:  auto; currently minimal
> (gdb) p debug_query_string
> $1 = 0x7fd12b84c238 "insert into list_parted values (null, 1);"
> (gdb) p partdesc[0]
> $2 = {
>   nparts = 2139062143,
>   oids = 0x7f7f7f7f7f7f7f7f,
>   boundinfo = 0x7f7f7f7f7f7f7f7f
> }
>
> As you can see, the partdesc is no longer valid here.  I'm not
> immediately sure how to fix this; this isn't a simple thinko.  You
> need to keep the relations open for the whole duration of the query,
> not just long enough to build the dispatch info.  I think you should
> try to revise this so that each relation is opened once and kept open;
> maybe the first loop should be making a pointer-list of Relations
> rather than an int-list of relation OIDs.

Thanks for the explanation, I see the problem.  I changed
PartitionDispatchData such that its 1st field is now Relation (instead of
current Oid), where we keep the relation descriptor of the corresponding
partitioned table.  Currently, only the leaf relations are held open in
their respective ResultRelInfo's (ModifyTableState.mt_partitions) and
later closed in ExecEndModifyTable().  Similarly, we have the
ModifyTableState.mt_partition_dispatch_info array, each of whose members
holds open a partitioned relation using the new field.  We close that too
along with leaf partitions as just mentioned. Similarly in case of COPY FROM.

> And it seems to me (though
> I'm getting a little fuzzy here because it's late) that you need all
> of the partitions open, not just the ones that are subpartitioned,
> because how else are you going to know how to remap the tuple if the
> column order is different?  But I don't see this code doing that,
> which makes me wonder if the partitions are being opened yet again in
> some other location.

Actually leaf partitions are opened by the respective callers of
RelationGetPartitionDispatchInfo() and held open in the corresponding
ResultRelInfo's (ModifyTableState.mt_partitions).  As mentioned above,
they will be closed by either ExecEndModifyTable() or CopyFrom() after
finishing the INSERT or COPY, respectively.

> I recommend that once you fix this, you run 'make check' with #define
> CLOBBER_CACHE_ALWAYS 1 and look for other hazards.  Such mistakes are
> easy to make with this kind of patch.

With the attached latest version of the patches, I couldn't see any
failures with a CLOBBER_CACHE_ALWAYS build.

Thanks,
Amit

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Push down more full joins in postgres_fdw
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Push down more full joins in postgres_fdw