Обсуждение: Problem with parallel_workers option (Was Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead)

Поиск
Список
Период
Сортировка
В письме от четверг, 3 января 2019 г. 17:15:08 MSK пользователь Alvaro Herrera
написал:

> I would have liked to get a StaticAssert in the definition, but I don't
> think it's possible.  A standard Assert() should be possible, though.

Asserts are cool thing. I found some unexpected stuff.

parallel_workers option is claimed to be heap-only option.

But in src/backend/optimizer/util/plancat.c in get_relation_info
RelationGetParallelWorkers is being called for both heap and toast tables (and
not only for them).

Because usually there are no reloptions for toast,  it returns default -1
value. If some reloptions were set for toast table, RelationGetParallelWorkers
will return a value from uninitialized memory.

This will happen because StdRdOptions structure is filled with values in
fillRelOptions function. And fillRelOptions first iterates on options list, and
then on elems. So if option is not in "option list" than it's value will not
be set.

And options list comes from parseRelOptions where only options with proper
relation kind is selected from [type]RelOpts[] arrays. So parallel_workers
will not be added to options in the case of toast table because it is claimed
to be applicable only to RELOPT_KIND_HEAP

Thus if toast has some options set, and rd_options in relation is not NULL,
get_relation_info will use value from uninitialized memory as number of
parallel workers.

To reproduce this Assert you can change RelationGetParallelWorkers macros in
src/include/utils/rel.h

#define IsHeapRelation(relation)                        \
    (relation->rd_rel->relkind == RELKIND_RELATION ||   \
     relation->rd_rel->relkind == RELKIND_MATVIEW  )

#define RelationGetParallelWorkers(relation, defaultpw)                 \
    (AssertMacro(IsHeapRelation(relation)),                             \
     ((relation)->rd_options ?                                          \
      ((StdRdOptions *) (relation)->rd_options)->parallel_workers :   \
            (defaultpw)))

and see how it asserts. It will happen just on the db_initiaisation phase of
make check.

If you add relation->rd_rel->relkind == RELKIND_TOASTEDVALUE to the Assertion,
it will Assert in cases when get_relation_info is called for partitioned
table. This case is not a problem for now, because partitioned table has no
options and you will always have NULL in rd_options and get default value. But
it will become a problem when somebody adds some options, especially it would
be a problem if this options do not use StdRdOptions structure.
Also it is called for foreign tables and sequences.

So my suggestion of a hotfix is to replcace
rel->rel_parallel_workers = RelationGetParallelWorkers(relation,-1);
from get_relation_info with the following code

    switch (relation->rd_rel->relkind)
    {
        case RELKIND_RELATION:
        case RELKIND_MATVIEW:
            rel->rel_parallel_workers =
                                    RelationGetParallelWorkers(relation,-1);
            break;
        case RELKIND_TOASTVALUE:
        case RELKIND_PARTITIONED_TABLE:
        case RELKIND_SEQUENCE:
        case REPLICA_IDENTITY_FULL: /* Foreign table */
            rel->rel_parallel_workers = -1;
            break;
        default:
            /* Other relkinds are not supported */
            Assert(false);
    }

But I am not familiar with get_relation_info and parallel_workers  specific. So
I suspect real fix may be quite different.

Also I would suggest to fix it in all supported stable branches that have
parallel_workers option, because this bug may give something unexpected when
some toast options are set.



On 2019-Jan-07, Nikolay Shaplov wrote:

> Asserts are cool thing. I found some unexpected stuff. 
> 
> parallel_workers option is claimed to be heap-only option.
> 
> But in src/backend/optimizer/util/plancat.c in get_relation_info 
> RelationGetParallelWorkers is being called for both heap and toast tables (and 
> not only for them).

Ugh.

I wonder if it makes sense for a toast table to have parallel_workers.
I suppose it's not useful, since a toast table is not supposed to be
scanned in bulk, only accessed through the tuptoaster interface.  But on
the other hand, you *can* do "select * from pg_toast_NNN", and it almost
all respects a toast table is just like a regular heap table.

> Because usually there are no reloptions for toast,  it returns default -1 
> value. If some reloptions were set for toast table, RelationGetParallelWorkers 
> will return a value from uninitialized memory.

Well, if it returns a negative number or zero, the rest of the server
should behave identically to it returning the -1 that was intended.  And
if it returns a positive number, the worst that will happen is that a
Path structure somewhere will have a positive number of workers, but
since queries on toast tables are not planned in the regular way, most
likely those Paths will never exist anyway.

So while I agree that this is a bug, it seems pretty benign.

Unless I overlook something.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В письме от понедельник, 7 января 2019 г. 13:56:48 MSK пользователь Alvaro
Herrera написал:

> > Asserts are cool thing. I found some unexpected stuff.
> >
> > parallel_workers option is claimed to be heap-only option.
> >
> > But in src/backend/optimizer/util/plancat.c in get_relation_info
> > RelationGetParallelWorkers is being called for both heap and toast tables
> > (and not only for them).
>
> Ugh.
>
> I wonder if it makes sense for a toast table to have parallel_workers.
> I suppose it's not useful, since a toast table is not supposed to be
> scanned in bulk, only accessed through the tuptoaster interface.  But on
> the other hand, you *can* do "select * from pg_toast_NNN", and it almost
> all respects a toast table is just like a regular heap table.

If parallel_workers is not intended to be used anywhere except heap and
matview, then may be better to make fix more relaxed. Like

if  (relation->rd_rel->relkind == RELKIND_RELATION ||
     relation->rd_rel->relkind == RELKIND_MATVIEW  )
 rel->rel_parallel_workers =
                                    RelationGetParallelWorkers(relation,-1);
else
 rel->rel_parallel_workers = -1;

> > Because usually there are no reloptions for toast,  it returns default -1
> > value. If some reloptions were set for toast table,
> > RelationGetParallelWorkers will return a value from uninitialized memory.
>
> Well, if it returns a negative number or zero, the rest of the server
> should behave identically to it returning the -1 that was intended.  And
> if it returns a positive number, the worst that will happen is that a
> Path structure somewhere will have a positive number of workers, but
> since queries on toast tables are not planned in the regular way, most
> likely those Paths will never exist anyway.
>
> So while I agree that this is a bug, it seems pretty benign.
It is mild until somebody introduce PartitionedlRelOptions. Then
PartitionedlRelOptions * will be converted to StdRdOptions* and we will get
segmentation fault...

So may be there is no need for back fix, but better to fix it now :-)
May be with the patch for StdRdOptions removal.