Обсуждение: select_parallel test fails with nonstandard block size

Поиск
Список
Период
Сортировка

select_parallel test fails with nonstandard block size

От
Peter Eisentraut
Дата:
When building with --with-blocksize=16, the select_parallel test fails
with this difference:
explain (costs off)       select  sum(parallel_restricted(unique1)) from tenk1       group
by(parallel_restricted(unique1));
-                     QUERY PLAN
-----------------------------------------------------
+                QUERY PLAN
+------------------------------------------- HashAggregate   Group Key: parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Seq Scan on tenk1
+(5 rows)
set force_parallel_mode=1;explain (costs off)

We know that different block sizes cause some test failures, mainly
because of row ordering differences.  But this looked a bit different.

The size of the tenk1 table is very similar under either block size:

16k: tenk1 = 2883584
8k:  tenk1 = 2932736

Is there an explanation for this difference, or is there something wrong
in the cost estimation somewhere?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: select_parallel test fails with nonstandard block size

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> When building with --with-blocksize=16, the select_parallel test fails
> with this difference:

>  explain (costs off)
>         select  sum(parallel_restricted(unique1)) from tenk1
>         group by(parallel_restricted(unique1));
> -                     QUERY PLAN
> -----------------------------------------------------
> +                QUERY PLAN
> +-------------------------------------------
>   HashAggregate
>     Group Key: parallel_restricted(unique1)
> -   ->  Index Only Scan using tenk1_unique1 on tenk1
> -(3 rows)
> +   ->  Gather
> +         Workers Planned: 4
> +         ->  Parallel Seq Scan on tenk1
> +(5 rows)

>  set force_parallel_mode=1;
>  explain (costs off)

> We know that different block sizes cause some test failures, mainly
> because of row ordering differences.  But this looked a bit different.

I suspect what is happening is that min_parallel_relation_size is
being interpreted differently (because the default is set at 1024
blocks, regardless of what BLCKSZ is) and that's affecting the
cost estimate for the parallel seqscan.  The direction of change
seems a bit surprising though; if the table is now half as big
blocks-wise, how did that make the parallel scan look cheaper?
Please step through create_plain_partial_paths and see what
is being done differently.

Possibly we ought to change things so that the default value of
min_parallel_relation_size is a fixed number of bytes rather
than a fixed number of blocks.  Not sure though.
        regards, tom lane



Re: select_parallel test fails with nonstandard block size

От
Robert Haas
Дата:
On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Possibly we ought to change things so that the default value of
> min_parallel_relation_size is a fixed number of bytes rather
> than a fixed number of blocks.  Not sure though.

The reason why this was originally reckoned in blocks is because the
data is divided between the workers on the basis of a block number.
In the degenerate case where blocks < workers, the extra workers will
get no blocks at all, and thus no rows at all.  It seemed best to
insist that the relation had a reasonable number of blocks so that we
could hope for a reasonably even distribution of work among a pool of
workers.  I'm not altogether sure that's the right way of thinking
about this problem but I'm not sure it's wrong, either; anyway, it's
as far as my thought process had progressed at the time I wrote the
code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: select_parallel test fails with nonstandard block size

От
Alvaro Herrera
Дата:
Robert Haas wrote:
> On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Possibly we ought to change things so that the default value of
> > min_parallel_relation_size is a fixed number of bytes rather
> > than a fixed number of blocks.  Not sure though.
> 
> The reason why this was originally reckoned in blocks is because the
> data is divided between the workers on the basis of a block number.

Maybe the solution is to fill the table to a given number of blocks
rather than a number of rows.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: select_parallel test fails with nonstandard block size

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Possibly we ought to change things so that the default value of
>> min_parallel_relation_size is a fixed number of bytes rather
>> than a fixed number of blocks.  Not sure though.

> The reason why this was originally reckoned in blocks is because the
> data is divided between the workers on the basis of a block number.
> In the degenerate case where blocks < workers, the extra workers will
> get no blocks at all, and thus no rows at all.

Well, sure, but at any reasonable value of min_parallel_relation_size
that won't be a factor.  The question here is whether we want the default
value to be platform-independent.  I notice that both config.sgml and
postgresql.conf.sample claim that the default value is 8MB, which this
discussion reveals to be a lie.  If you want to keep the default expressed
as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
documentation.
        regards, tom lane



Re: select_parallel test fails with nonstandard block size

От
Robert Haas
Дата:
On Thu, Sep 15, 2016 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Possibly we ought to change things so that the default value of
>>> min_parallel_relation_size is a fixed number of bytes rather
>>> than a fixed number of blocks.  Not sure though.
>
>> The reason why this was originally reckoned in blocks is because the
>> data is divided between the workers on the basis of a block number.
>> In the degenerate case where blocks < workers, the extra workers will
>> get no blocks at all, and thus no rows at all.
>
> Well, sure, but at any reasonable value of min_parallel_relation_size
> that won't be a factor.  The question here is whether we want the default
> value to be platform-independent.  I notice that both config.sgml and
> postgresql.conf.sample claim that the default value is 8MB, which this
> discussion reveals to be a lie.  If you want to keep the default expressed
> as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
> documentation.

I don't particularly care about that.  Changing it to 8MB always would
be fine with me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: select_parallel test fails with nonstandard block size

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 15, 2016 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, sure, but at any reasonable value of min_parallel_relation_size
>> that won't be a factor.  The question here is whether we want the default
>> value to be platform-independent.  I notice that both config.sgml and
>> postgresql.conf.sample claim that the default value is 8MB, which this
>> discussion reveals to be a lie.  If you want to keep the default expressed
>> as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
>> documentation.

> I don't particularly care about that.  Changing it to 8MB always would
> be fine with me.

OK, I'll take care of it (since I now realize that the inconsistency
is my own fault --- I committed that GUC not you).  It's unclear what
this will do for Peter's complaint though.
        regards, tom lane



Re: select_parallel test fails with nonstandard block size

От
Tom Lane
Дата:
I wrote:
> OK, I'll take care of it (since I now realize that the inconsistency
> is my own fault --- I committed that GUC not you).  It's unclear what
> this will do for Peter's complaint though.

On closer inspection, the answer is "nothing", because the select_parallel
test overrides the default value of min_parallel_relation_size anyway.
(Without that, I don't think tenk1 is large enough to trigger
consideration of parallel scan at all.)

I find that at BLCKSZ 8K, the planner thinks the best plan is
HashAggregate  (cost=5320.28..7920.28 rows=10000 width=12)  Group Key: parallel_restricted(unique1)  ->  Index Only
Scanusing tenk1_unique1 on tenk1  (cost=0.29..2770.28 rows=10000 width=8)
 

which is what the regression test script expects.  Forcing the parallel
plan to be chosen, we get this using the cost parameters set up by
select_parallel:
HashAggregate  (cost=5433.00..8033.00 rows=10000 width=12)  Group Key: parallel_restricted(unique1)  ->  Gather
(cost=0.00..2883.00rows=10000 width=8)        Workers Planned: 4        ->  Parallel Seq Scan on tenk1
(cost=0.00..383.00rows=2500 width=4)
 

However, at BLCKSZ 16K, we get these numbers instead:
HashAggregate  (cost=5264.28..7864.28 rows=10000 width=12)  Group Key: parallel_restricted(unique1)  ->  Index Only
Scanusing tenk1_unique1 on tenk1  (cost=0.29..2714.28 rows=10000 width=8)
 
HashAggregate  (cost=5251.00..7851.00 rows=10000 width=12)  Group Key: parallel_restricted(unique1)  ->  Gather
(cost=0.00..2701.00rows=10000 width=8)        Workers Planned: 4        ->  Parallel Seq Scan on tenk1
(cost=0.00..201.00rows=2500 width=4)
 

so the planner goes for the second one.

I don't think there's anything particularly broken here.  The seqscan
cost estimate is largely dependent on the number of blocks, and there's
half as many blocks at 16K.  The indexscan estimate is also reduced,
but not as much, so it stops looking like the cheaper alternative.

We could maybe twiddle the cost parameters select_parallel uses so that
the same plan is chosen at both block sizes, but it seems like it would
be very fragile, and I'm not sure there's much point.
        regards, tom lane