Обсуждение: Why is Postgres only using 8 cores for partitioned count?

Поиск

Список

Период

Сортировка

Why is Postgres only using 8 cores for partitioned count?

От

"Seamus Abshere"

Дата:

13 февраля 2021 г., 00:50:18

hi,

How can I convince Postgres to use more than 8 cores?

I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres 13.1 on Ubuntu 20.04.

CREATE TABLE tbl (
  [...]
) PARTITION BY HASH (address_key);

It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc.

We're running `SELECT COUNT(*) FROM tbl`.

I've watched top and I never see more than 8 cores going 100%.

Here is my (admittedly ridiculous) postgresql.conf:

checkpoint_completion_target = 0.9
data_directory='/tank/postgresql/13/main'
default_statistics_target = 100
effective_cache_size = 381696MB
effective_io_concurrency = 200
enable_partition_pruning=on
enable_partitionwise_aggregate=on
enable_partitionwise_join=on
listen_addresses='*'
maintenance_work_mem = 2GB
max_connections = 200
max_parallel_maintenance_workers = 4
max_parallel_workers = 512
max_parallel_workers_per_gather = 512
max_wal_size = 4GB
max_worker_processes = 512
min_wal_size = 1GB
random_page_cost = 1.1
shared_buffers = 127232MB
shared_preload_libraries = 'cstore_fdw'
synchronous_commit=off
wal_buffers = 16MB
work_mem = 1628560kB

Best,
Seamus


--
Seamus Abshere, SCEA
https://faraday.ai
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

"Seamus Abshere"

Дата:

14 февраля 2021 г., 03:09:11

hi,

I've traced this back to the formula for Parallel Append workers - log2(partitions).

The comment from Robert says: (src/backend/optimizer/path/allpaths.c)

        /*
         * If the use of parallel append is permitted, always request at least
         * log2(# of children) workers. 

In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8
secondswith 8 workers.
 

I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (even
accountingfor transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc.
(postgresql.confin my first message).
 

Here are full plans https://gist.github.com/seamusabshere/1c1d869ee24145689f45e4124d8730d3

Best,
Seamus

On Fri, Feb 12, 2021, at 4:50 PM, Seamus Abshere wrote:
> hi,
> 
> How can I convince Postgres to use more than 8 cores?
> 
> I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres 
> 13.1 on Ubuntu 20.04.
> 
> CREATE TABLE tbl (
>   [...]
> ) PARTITION BY HASH (address_key);
> 
> It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc.
> 
> We're running `SELECT COUNT(*) FROM tbl`.
> 
> I've watched top and I never see more than 8 cores going 100%.
> 
> Here is my (admittedly ridiculous) postgresql.conf:
> 
> checkpoint_completion_target = 0.9
> data_directory='/tank/postgresql/13/main'
> default_statistics_target = 100
> effective_cache_size = 381696MB
> effective_io_concurrency = 200
> enable_partition_pruning=on
> enable_partitionwise_aggregate=on
> enable_partitionwise_join=on
> listen_addresses='*'
> maintenance_work_mem = 2GB
> max_connections = 200
> max_parallel_maintenance_workers = 4
> max_parallel_workers = 512
> max_parallel_workers_per_gather = 512
> max_wal_size = 4GB
> max_worker_processes = 512
> min_wal_size = 1GB
> random_page_cost = 1.1
> shared_buffers = 127232MB
> shared_preload_libraries = 'cstore_fdw'
> synchronous_commit=off
> wal_buffers = 16MB
> work_mem = 1628560kB
> 
> Best,
> Seamus
> 
> 
> --
> Seamus Abshere, SCEA
> https://faraday.ai
> https://github.com/seamusabshere
> https://linkedin.com/in/seamusabshere

700% faster (was: Re: Why is Postgres only using 8 cores for partitioned count? [Parallel append])

От

"Seamus Abshere"

Дата:

14 февраля 2021 г., 06:10:43

hi,

Apologies for the self-replying, but I made it 700% faster by compiling from source and forcing the parallel_workers
from7 to 96.
 

If this weren't a partitioned table, I could just do `alter table test_3pd_cstore_partitioned set (parallel_workers =
96)`- but that isn't currently supported.
 

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..0057a69d4e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1544,6 +1544,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
                                max_parallel_workers_per_gather);
         Assert(parallel_workers > 0);
 
+        // force a crazy parallelism
+        parallel_workers = 96;
+        
         appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
                                         pa_partial_subpaths,
                                         NIL, NULL, parallel_workers, true,

BEFORE:

postgres=# explain analyze select count(*) from test_3pd_cstore_partitioned where age > 50;

 Finalize Aggregate  (cost=558860.96..558860.97 rows=1 width=8) (actual time=8528.509..8538.627 rows=1 loops=1)
   ->  Gather  (cost=56918.65..558860.94 rows=7 width=8) (actual time=1863.462..8538.603 rows=64 loops=1)
         Workers Planned: 7
         Workers Launched: 7
         ->  Parallel Append  (cost=55918.65..557860.24 rows=1 width=8) (actual time=1877.875..8417.486 rows=8
loops=8)
[...]
 Execution Time: 8565.734 ms

AFTER:

postgres=# explain analyze select count(*) from test_3pd_cstore_partitioned where age > 50;
 Finalize Aggregate  (cost=57030.20..57030.21 rows=1 width=8) (actual time=1125.828..1129.958 rows=1 loops=1)
   ->  Gather  (cost=56611.92..57029.96 rows=96 width=8) (actual time=994.708..1129.940 rows=64 loops=1)
         Workers Planned: 96
         Workers Launched: 96
         ->  Parallel Append  (cost=55611.92..56020.36 rows=1 width=8) (actual time=656.027..656.051 rows=1 loops=97)
[...]
 Execution Time: 1133.810 ms

Should I try to submit a patch that adds support for "alter table test_3pd_cstore_partitioned set (parallel_workers =
96)"for partitioned tables?
 

Best,
Seamus

PS. The awesome thing about this is that I'm table scanning a 270 million row, 600 column table in 1.2 seconds, which I
neverthought was possible.
 

PPS. I have no idea why 96 worked better than 64 (the number of cores I have), but it did - 700% vs 400%.

On Sat, Feb 13, 2021, at 7:09 PM, Seamus Abshere wrote:
> hi,
> 
> I've traced this back to the formula for Parallel Append workers - 
> log2(partitions).
> 
> The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
> 
>         /*
>          * If the use of parallel append is permitted, always request at least
>          * log2(# of children) workers. 
> 
> In my case, every partition takes 1 second to scan, I have 64 cores, I 
> have 64 partitions, and the wall time is 8 seconds with 8 workers.
> 
> I assume that if it it planned significantly more workers (16? 32? even 
> 64?), it would get significantly faster (even accounting for 
> transaction cost). So why doesn't it ask for more? Note that I've set 
> max_parallel_workers=512, etc. (postgresql.conf in my first message).
> 
> Here are full plans 
> https://gist.github.com/seamusabshere/1c1d869ee24145689f45e4124d8730d3
> 
> Best,
> Seamus
> 
> On Fri, Feb 12, 2021, at 4:50 PM, Seamus Abshere wrote:
> > hi,
> > 
> > How can I convince Postgres to use more than 8 cores?
> > 
> > I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres 
> > 13.1 on Ubuntu 20.04.
> > 
> > CREATE TABLE tbl (
> >   [...]
> > ) PARTITION BY HASH (address_key);
> > 
> > It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc.
> > 
> > We're running `SELECT COUNT(*) FROM tbl`.
> > 
> > I've watched top and I never see more than 8 cores going 100%.
> > 
> > Here is my (admittedly ridiculous) postgresql.conf:
> > 
> > checkpoint_completion_target = 0.9
> > data_directory='/tank/postgresql/13/main'
> > default_statistics_target = 100
> > effective_cache_size = 381696MB
> > effective_io_concurrency = 200
> > enable_partition_pruning=on
> > enable_partitionwise_aggregate=on
> > enable_partitionwise_join=on
> > listen_addresses='*'
> > maintenance_work_mem = 2GB
> > max_connections = 200
> > max_parallel_maintenance_workers = 4
> > max_parallel_workers = 512
> > max_parallel_workers_per_gather = 512
> > max_wal_size = 4GB
> > max_worker_processes = 512
> > min_wal_size = 1GB
> > random_page_cost = 1.1
> > shared_buffers = 127232MB
> > shared_preload_libraries = 'cstore_fdw'
> > synchronous_commit=off
> > wal_buffers = 16MB
> > work_mem = 1628560kB
> > 
> > Best,
> > Seamus

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

David Rowley

Дата:

14 февраля 2021 г., 12:47:30

On Sun, 14 Feb 2021 at 13:15, Seamus Abshere
<sabshere@alumni.princeton.edu> wrote:
> The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
>
>                 /*
>                  * If the use of parallel append is permitted, always request at least
>                  * log2(# of children) workers.
>
> In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8
secondswith 8 workers. 
>
> I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (even
accountingfor transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc.
(postgresql.confin my first message). 

There's perhaps an argument for allowing ALTER TABLE <partitioned
table> SET (parallel_workers=N); to be set on partitioned tables, but
we don't currently allow it.

What you might want to try is setting that for any of those 64
partitions.  Shortly above the code comment that you quoted above,
there's some code that finds the path for the partition with the
maximum number of parallel workers. If one of those partitions is
using, say 64 workers because you set the partitions
"parallel_workers" setting to 64, and providing you have
max_parallel_workers_per_gather set highly enough, then your Append
should get 64 workers.

You'll need to be careful though since changing the partitions
parallel_workers may affect things for other queries too. Also, if you
were to only change 1 partition and that partition were to be pruned,
then you'd not get the 64 workers.

David

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

"Seamus Abshere"

Дата:

14 февраля 2021 г., 15:41:11

Hi David,

That is a great suggestion, except my partitions are foreign tables.

I wonder if I could figure out a single empty partition that’s just a normal table... but I’m using hash partitioning,
soI think I would lose a modulus. 

Best,
Seamus

--
Seamus Abshere, SCEA
+1 (608) 772-0696
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere

On Sun, Feb 14, 2021, at 4:47 AM, David Rowley wrote:
> On Sun, 14 Feb 2021 at 13:15, Seamus Abshere
> <sabshere@alumni.princeton.edu> wrote:
> > The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
> >
> >                 /*
> >                  * If the use of parallel append is permitted, always request at least
> >                  * log2(# of children) workers.
> >
> > In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8
secondswith 8 workers. 
> >
> > I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster
(evenaccounting for transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512,
etc.(postgresql.conf in my first message). 
>
> There's perhaps an argument for allowing ALTER TABLE <partitioned
> table> SET (parallel_workers=N); to be set on partitioned tables, but
> we don't currently allow it.
>
> What you might want to try is setting that for any of those 64
> partitions.  Shortly above the code comment that you quoted above,
> there's some code that finds the path for the partition with the
> maximum number of parallel workers. If one of those partitions is
> using, say 64 workers because you set the partitions
> "parallel_workers" setting to 64, and providing you have
> max_parallel_workers_per_gather set highly enough, then your Append
> should get 64 workers.
>
> You'll need to be careful though since changing the partitions
> parallel_workers may affect things for other queries too. Also, if you
> were to only change 1 partition and that partition were to be pruned,
> then you'd not get the 64 workers.
>
> David
>

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

Gavin Flower

Дата:

15 февраля 2021 г., 00:16:04

On 14/02/2021 22:47, David Rowley wrote:
> On Sun, 14 Feb 2021 at 13:15, Seamus Abshere
> <sabshere@alumni.princeton.edu> wrote:
>> The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
>>
>>                  /*
>>                   * If the use of parallel append is permitted, always request at least
>>                   * log2(# of children) workers.
>>
>> In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8
secondswith 8 workers.
 
>>
>> I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster
(evenaccounting for transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512,
etc.(postgresql.conf in my first message).
 
> There's perhaps an argument for allowing ALTER TABLE <partitioned
> table> SET (parallel_workers=N); to be set on partitioned tables, but
> we don't currently allow it.
[...]
> David

Just wondering why there is a hard coded limit.

While I agree it might be good to be able specify the number of workers, 
sure it would be possible to derive a suitable default based on the 
number of effective processors available?


Cheers,
Gavin

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

Laurenz Albe

Дата:

15 февраля 2021 г., 15:59:40

On Sun, 2021-02-14 at 22:47 +1300, David Rowley wrote:
> On Sun, 14 Feb 2021 at 13:15, Seamus Abshere
> 
> <sabshere@alumni.princeton.edu> wrote:
> 
> > The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
> >                  /*
> >                   * If the use of parallel append is permitted, always request at least
> >                   * log2(# of children) workers.
> > In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8
secondswith 8 workers.
 
> > I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster
(evenaccounting for transaction cost). So why doesn't it ask for more? Note that
 
> > I've set max_parallel_workers=512, etc. (postgresql.conf in my first message).
> 
> There's perhaps an argument for allowing ALTER TABLE <partitioned
> table> SET (parallel_workers=N); to be set on partitioned tables, but
> we don't currently allow it.

That would be great; I have been hit by this before.

> What you might want to try is setting that for any of those 64
> partitions.  Shortly above the code comment that you quoted above,
> there's some code that finds the path for the partition with the
> maximum number of parallel workers. If one of those partitions is
> using, say 64 workers because you set the partitions
> "parallel_workers" setting to 64, and providing you have
> max_parallel_workers_per_gather set highly enough, then your Append
> should get 64 workers.

Hmm - that didn't work when I tried it, but perhaps I should try again.

> You'll need to be careful though since changing the partitions
> parallel_workers may affect things for other queries too. Also, if you
> were to only change 1 partition and that partition were to be pruned,
> then you'd not get the 64 workers.

I think this is an area where parallel query could be improved.

One think is runtime partition pruning:  if the optimizer thinks that
it will have to scan a lot of partitions, it will plan a lot of workers.
But if the executor reduces that number to 1, we end up with way too
many workers.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

Fabio Pardi

Дата:

15 февраля 2021 г., 16:11:55


On 14/02/2021 22:16, Gavin Flower wrote:
> On 14/02/2021 22:47, David Rowley wrote:
>> On Sun, 14 Feb 2021 at 13:15, Seamus Abshere
>> <sabshere@alumni.princeton.edu> wrote:
>>> The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
>>>
>>>                  /*
>>>                   * If the use of parallel append is permitted, always request at least
>>>                   * log2(# of children) workers.
>>>
>>> In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8
secondswith 8 workers. 
>>>
>>> I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster
(evenaccounting for transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512,
etc.(postgresql.conf in my first message). 
>> There's perhaps an argument for allowing ALTER TABLE <partitioned
>> table> SET (parallel_workers=N); to be set on partitioned tables, but
>> we don't currently allow it.
> [...]
>> David
>
> Just wondering why there is a hard coded limit.
>
> While I agree it might be good to be able specify the number of workers, sure it would be possible to derive a
suitabledefault based on the number of effective processors available? 
>


I had the same problem and my conclusion was that it is not possible to go above 8 cores because of Amdahl's law on
parallelcomputing. More here: https://en.wikipedia.org/wiki/Amdahl%27s_law 

regards,

fabio pardi

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

David Rowley

Дата:

16 февраля 2021 г., 06:15:41

On Mon, 15 Feb 2021 at 10:16, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> Just wondering why there is a hard coded limit.

I don't see where the hardcoded limit is.  The total number is limited
to max_parallel_workers_per_gather, but there's nothing hardcoded
about the value of that.

> While I agree it might be good to be able specify the number of workers,
> sure it would be possible to derive a suitable default based on the
> number of effective processors available?

It's a pretty tricky thing to get right.  The problem is that
something has to rationalise the use of parallel worker processes.
Does it seem reasonable to you to use the sum of the Append child
parallel workers?  If so, I imagine someone else would think that
would be pretty insane. We do have to consider the fact that we're
trying to share those parallel worker processes with other backends
which also might want to get some use out of them.

As for if we need some rel option for partitioned tables.  I think
that's also tricky. Sure, we could go and add a "parallel_workers"
relopt to partitioned tables, but then that's going to be applied
regardless of how many partitions survive partition pruning.  There
could be as little as 2 subpaths in an Append, or the number could be
in the thousands. I can't imagine anyone really wants the same number
of parallel workers in each of those two cases.  So I can understand
why ab7271677 wanted to take into account the number of append
children.

Maybe there's some room for some other relopt that just changes the
behaviour of that code.  It does not seem too unreasonable that
someone might like to take the sum of the Append child parallel
workers.  That value would still be capped at
max_parallel_workers_per_gather, so it shouldn't ever go too insane
unless someone set that GUC to something insane, which would be their
choice.  I'm not too sure which such a relopt would be called.

Additionally, for the case being reported here.  Since all Append
children are foreign tables, there is actually some work going on to
make it so workers don't have to sit by and wait until the foreign
server returns the results.  I don't think anyone would disagree that
it's pretty poor use of a parallel worker to have it sit there doing
nothing for minutes at a time waiting for a single tuple from a
foreign data wrapper.  I'm not sure of the status of that work, but if
you want to learn more about it, please see [1]

David

[1] https://commitfest.postgresql.org/32/2491/

Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

От

David Rowley

Дата:

16 февраля 2021 г., 06:20:16

On Tue, 16 Feb 2021 at 02:12, Fabio Pardi <f.pardi@portavita.eu> wrote:
>
> On 14/02/2021 22:16, Gavin Flower wrote:
> > While I agree it might be good to be able specify the number of workers, sure it would be possible to derive a
suitabledefault based on the number of effective processors available?

>
> I had the same problem and my conclusion was that it is not possible to go above 8 cores because of Amdahl's law on
parallelcomputing. More here: https://en.wikipedia.org/wiki/Amdahl%27s_law

That would really depend on what the non-parallel part of the equation
was.  There are some plan shapes such as GROUP BY or aggregate queries
with very few or just 1 group where the serial portion of the
execution is very small indeed.

David

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Why is Postgres only using 8 cores for partitioned count?