Обсуждение: Parallel hints in PostgreSQL with consistent perfromance

Поиск
Список
Период
Сортировка

Parallel hints in PostgreSQL with consistent perfromance

От
mohini mane
Дата:
Hello Team,
I observed that increasing the degree of parallel hint in the SELECT query did not show performance improvements.
Below are the details of sample execution with EXPLAIN ANALYZE
PostgreSQL Version: v15.5

Operating System details: RHL 7.x
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2294.684
BogoMIPS:              4589.36
Hypervisor vendor:     Microsoft
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              51200K
NUMA node0 CPU(s):     0-15

PostgreSQL Query: 
Sample sql executed through psql command prompt
force_parallel_mode=on; max_parallel_workers_per_gather=200;
max_parallel_workers=6

explain analyze select /*+ PARALLEL(A 6) */ ctid::varchar,  md5("col1"||'~'||"col7"||'~'||"col9"::varchar) , md5("id"||'~'||"gender"||'~'||"firstname"||'~'||"lastname"||'~'||"address"||'~'||"city"||'~'||"salary"||'~'||"pincode"||'~'||"sales"||'~'||"phone"||'~'||"amount"||'~'||"dob"||'~'||"starttime"||'~'||"timezone"||'~'||"status"||'~'||"timenow"||'~'||"timelater"||'~'||"col2"||'~'||"col3"||'~'||"col4"||'~'||"col5"||'~'||"col6"||'~'||"col8"||'~'||"col10"||'~'||"col11"||'~'||"col12"||'~'||"col13"||'~'||"col14"||'~'||"col15"||'~'||"col16"||'~'||"col17"::varchar) ,  md5('@'||"col1"||'~'||"col7"||'~'||"col9"::varchar)  from "sp_qarun"."basic2" A  order by 2,4,3;

Output:
PSQL query execution with hints 6 for 1st time => 203505.402 ms
PSQL query execution with hints 6 for 2nd time => 27920.272 ms
PSQL query execution with hints 6 for 3rd time => 27666.770 ms
Only 6 workers launched, and there is no reduction in execution time even after increasing the degree of parallel hints in select query.

Table Structure:
create table basic2(id int,gender char,firstname varchar(3000),
lastname varchar(3000),address varchar(3000),city varchar(900),salary smallint,
pincode bigint,sales numeric,phone real,amount double precision,
dob date,starttime timestamp,timezone TIMESTAMP WITH TIME ZONE,
status boolean,timenow time,timelater TIME WITH TIME ZONE,col1 int,
col2 char,col3 varchar(3000),col4 varchar(3000),col5 varchar(3000),
col6 varchar(900),col7 smallint,col8 bigint,col9 numeric,col10 real,
col11 double precision,col12 date,col13 timestamp,col14 TIMESTAMP WITH TIME ZONE,
col15 boolean,col16 time,col17 TIME WITH TIME ZONE,primary key(col1,col7,col9)); 

Table Data: 1000000 rows with each Row has a size of 20000.

Thanks,
Mohini




Re: Parallel hints in PostgreSQL with consistent perfromance

От
Tomas Vondra
Дата:
On 12/27/23 14:15, mohini mane wrote:
> Hello Team,
> I observed that increasing the degree of parallel hint* *in the SELECT
> query did not show performance improvements.
> Below are the details of sample execution with EXPLAIN ANALYZE
> *PostgreSQL Version:* v15.5
> 
> *Operating System details:* RHL 7.x
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    1
> Core(s) per socket:    16
> Socket(s):             1
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 79
> Model name:            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
> Stepping:              1
> CPU MHz:               2294.684
> BogoMIPS:              4589.36
> Hypervisor vendor:     Microsoft
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              51200K
> NUMA node0 CPU(s):     0-15
> 
> *PostgreSQL Query:* 
> Sample sql executed through psql command prompt
> force_parallel_mode=on; max_parallel_workers_per_gather=200;
> max_parallel_workers=6
> 
> explain analyze select /*+ PARALLEL(A 6) */ ctid::varchar,
>  md5("col1"||'~'||"col7"||'~'||"col9"::varchar) ,
>
md5("id"||'~'||"gender"||'~'||"firstname"||'~'||"lastname"||'~'||"address"||'~'||"city"||'~'||"salary"||'~'||"pincode"||'~'||"sales"||'~'||"phone"||'~'||"amount"||'~'||"dob"||'~'||"starttime"||'~'||"timezone"||'~'||"status"||'~'||"timenow"||'~'||"timelater"||'~'||"col2"||'~'||"col3"||'~'||"col4"||'~'||"col5"||'~'||"col6"||'~'||"col8"||'~'||"col10"||'~'||"col11"||'~'||"col12"||'~'||"col13"||'~'||"col14"||'~'||"col15"||'~'||"col16"||'~'||"col17"::varchar)
, md5('@'||"col1"||'~'||"col7"||'~'||"col9"::varchar)  from "sp_qarun"."basic2" A  order by 2,4,3;
 
> 

Postgres doesn't support hints, so the /* ... */ part of the query is
just a comment and doesn't effect the parallelism at all. The main thing
influencing that are the GUC values you set before.


> *Output:*
> PSQL query execution with hints 6 for 1st time => 203505.402 ms
> PSQL query execution with hints 6 for 2nd time => 27920.272 ms
> PSQL query execution with hints 6 for 3rd time => 27666.770 ms
> Only 6 workers launched, and there is no reduction in execution time
> even after increasing the degree of parallel hints in select query.
> 

It's unclear if what exactly you changed, and what case you're comparing
the timing to. As I explained earlier, the hint comment has no effect.
So if that's what you increased, it's not surprising the timing does not
change.

Also, max_parallel_workers is the maximum total number of parallel
workers, i.e. it's upper bound of max_parallel_workers_per_gather. So if
you set it to 6, there will never be more than 6 workers, no matter what
value max_parallel_workers_per_gather is set to.

FWIW force_parallel_more is really meant for testing (in the context of
developing the database itself), it's hardly the thing you should do in
any other case, like for example testing performance.


> *Table Structure:*
> create table basic2(id int,gender char,firstname varchar(3000),
> lastname varchar(3000),address varchar(3000),city varchar(900),salary
> smallint,
> pincode bigint,sales numeric,phone real,amount double precision,
> dob date,starttime timestamp,timezone TIMESTAMP WITH TIME ZONE,
> status boolean,timenow time,timelater TIME WITH TIME ZONE,col1 int,
> col2 char,col3 varchar(3000),col4 varchar(3000),col5 varchar(3000),
> col6 varchar(900),col7 smallint,col8 bigint,col9 numeric,col10 real,
> col11 double precision,col12 date,col13 timestamp,col14 TIMESTAMP WITH
> TIME ZONE,
> col15 boolean,col16 time,col17 TIME WITH TIME ZONE,primary
> key(col1,col7,col9)); 
> 
> *Table Data:* 1000000 rows with each Row has a size of 20000.
> 

Without the data we can't actually try running the query.

In general it's a good idea to show the "explain analyze" output for the
cases you're comparing. Not only that shows what the database is doing,
it also shows timings for different parts of the query, how many workers
were planned / actually started etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Parallel hints in PostgreSQL with consistent perfromance

От
Jeff Janes
Дата:
On Wed, Dec 27, 2023 at 8:15 AM mohini mane <mohini.android@gmail.com> wrote:
Hello Team,
I observed that increasing the degree of parallel hint in the SELECT query did not show performance improvements.
Below are the details of sample execution with EXPLAIN ANALYZE

PostgreSQL doesn't have hints, unless you are using pg_hint_plan. Which you should say if you are.

Output:
PSQL query execution with hints 6 for 1st time => 203505.402 ms
PSQL query execution with hints 6 for 2nd time => 27920.272 ms
PSQL query execution with hints 6 for 3rd time => 27666.770 ms
Only 6 workers launched, and there is no reduction in execution time even after increasing the degree of parallel hints in select query.

All you are showing here is the effect of caching the data in memory.  You allude to changing the degree, but didn't show any results, or even describe what the change was.  Is 6 the base from which you increased, or is it the result of having done the increase?

Cheers,

Jeff

Re: Parallel hints in PostgreSQL with consistent perfromance

От
mohini mane
Дата:
Thank you for your response !!
I am experimenting with SQL query performance for SELECT queries on large tables and I observed that changing/increasing the degree of parallel hint doesn't give the expected performance improvement.

I have executed the SELECT query with 2,4 & 6 parallel degree however every time only 4 workers launched & there was a slight increase in Execution time as well, why there is an increase in execution time with parallel degree 6 as compared to 2 or 4?
Please refer to the test results 

I am sharing the latest test results here :
Session variables set in psql prompt:
# show max_parallel_workers;
 max_parallel_workers
----------------------
 8
(1 row)

# show max_parallel_workers_per_gather;
 max_parallel_workers_per_gather
---------------------------------
 6
(1 row)

1st time query executed with PARALLEL DEGREE 2 
explain analyze select /*+ PARALLEL(A 2) */ * from test_compare_all_col_src1 A;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.00..45524.73 rows=949636 width=97) (actual time=0.673..173.017 rows=955000 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on test_compare_all_col_src1 a  (cost=0.00..44565.09 rows=237409 width=97) (actual time=0.039..51.941 rows=191000 loops=5)
 Planning Time: 0.093 ms
 Execution Time: 209.745 ms
(6 rows)

2nd time query executed with PARALLEL DEGREE 4
explain analyze select /*+ PARALLEL(A 4) */ * from aparopka.test_compare_all_col_src1 A;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.00..45524.73 rows=949636 width=97) (actual time=0.459..174.771 rows=955000 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on test_compare_all_col_src1 a  (cost=0.00..44565.09 rows=237409 width=97) (actual time=0.038..54.320 rows=191000 loops=5)
 Planning Time: 0.073 ms
 Execution Time: 210.170 ms
(6 rows)

3rd time query executed with PARALLEL DEGREE 6

explain analyze select /*+ PARALLEL(A 6) */ * from aparopka.test_compare_all_col_src1 A;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.00..45524.73 rows=949636 width=97) (actual time=0.560..196.586 rows=955000 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on test_compare_all_col_src1 a  (cost=0.00..44565.09 rows=237409 width=97) (actual time=0.049..58.741 rows=191000 loops=5)
 Planning Time: 0.095 ms
 Execution Time: 235.365 ms
(6 rows)

Table Schema :

                                       Table "test_compare_all_col_src1"
     Column      |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description
-----------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 col_smallint    | integer                     |           |          |         | plain    |              |
 col_int         | integer                     |           |          |         | plain    |              |
 col_bigint      | bigint                      |           | not null |         | plain    |              |
 col_numeric     | numeric                     |           |          |         | main     |              |
 col_real        | real                        |           |          |         | plain    |              |
 col_double      | double precision            |           |          |         | plain    |              |
 col_bool        | boolean                     |           |          |         | plain    |              |
 col_char        | character(1)                |           |          |         | extended |              |
 col_varchar     | character varying(2000)     |           |          |         | extended |              |
 col_date        | date                        |           |          |         | plain    |              |
 col_time        | time without time zone      |           |          |         | plain    |              |
 col_timetz      | time with time zone         |           |          |         | plain    |              |
 col_timestamp   | timestamp without time zone |           |          |         | plain    |              |
 col_timestamptz | timestamp with time zone    |           |          |         | plain    |              |
Indexes:
    "test_compare_all_col_src1_pkey" PRIMARY KEY, btree (col_bigint)
Replica Identity: FULL
Access method: heap


# select count(*) from test_compare_all_col_src1;
 count
--------
 955000
(1 row)

Thanks,
--Mohini


On Wed, 27 Dec 2023, 20:11 Jeff Janes, <jeff.janes@gmail.com> wrote:
On Wed, Dec 27, 2023 at 8:15 AM mohini mane <mohini.android@gmail.com> wrote:
Hello Team,
I observed that increasing the degree of parallel hint in the SELECT query did not show performance improvements.
Below are the details of sample execution with EXPLAIN ANALYZE

PostgreSQL doesn't have hints, unless you are using pg_hint_plan. Which you should say if you are.

Output:
PSQL query execution with hints 6 for 1st time => 203505.402 ms
PSQL query execution with hints 6 for 2nd time => 27920.272 ms
PSQL query execution with hints 6 for 3rd time => 27666.770 ms
Only 6 workers launched, and there is no reduction in execution time even after increasing the degree of parallel hints in select query.

All you are showing here is the effect of caching the data in memory.  You allude to changing the degree, but didn't show any results, or even describe what the change was.  Is 6 the base from which you increased, or is it the result of having done the increase?

Cheers,

Jeff

Re: Parallel hints in PostgreSQL with consistent perfromance

От
Matheus de Oliveira
Дата:


On Thu, Dec 28, 2023 at 9:47 AM mohini mane <mohini.android@gmail.com> wrote:
Thank you for your response !!
I am experimenting with SQL query performance for SELECT queries on large tables and I observed that changing/increasing the degree of parallel hint doesn't give the expected performance improvement.

Why do you believe you are changing the degree of parallelism? PostgreSQL does not have parallel hints (or any hint in comments), so you are just changing a comment in the queries, which changes nothing at all in the execution plan.

Unless you are not using vanilla PostgreSQL or you have some extension in place, in which case you didn't provide enough information.

Best regards,

--
Matheus de Oliveira


Re: Parallel hints in PostgreSQL with consistent perfromance

От
"David G. Johnston"
Дата:
On Thursday, December 28, 2023, mohini mane <mohini.android@gmail.com> wrote:
Thank you for your response !!
I am experimenting with SQL query performance for SELECT queries on large tables and I observed that changing/increasing the degree of parallel hint doesn't give the expected performance improvement.

I have executed the SELECT query with 2,4 & 6 parallel degree however every time only 4 workers launched & there was a slight increase in Execution time as well, why there is an increase in execution time with parallel degree 6 as compared to 2 or 4?

Random environmental effects.

Also, analyzing a performance test without understanding how “buffers” are used is largely pointless.

Whatever told you about that comment syntax is hallucinating.

Please don’t reply by top-posting. Inline reply to the comments others make directly and trim as needed.  Simply restating your first email isn’t productive.

You cannot enforce the number of workers used, only the the maximum.  That is you knob.

David J.

Re: Parallel hints in PostgreSQL with consistent perfromance

От
Jeff Janes
Дата:


On Thu, Dec 28, 2023 at 7:47 AM mohini mane <mohini.android@gmail.com> wrote:
Thank you for your response !!
I am experimenting with SQL query performance for SELECT queries on large tables and I observed that changing/increasing the degree of parallel hint doesn't give the expected performance improvement.

But you still have addressed the fact that PostgreSQL does not have planner hints.

Are you using some nonstandard extension, or nonstandard fork?
 
I have executed the SELECT query with 2,4 & 6 parallel degree however every time only 4 workers launched & there was a slight increase in Execution time as well,

Adding an ignored comment to your SQL would not be expected to do anything.  So it is not surprising that it does not do anything about the number of workers launched.  It is just a comment.  A note to the human who is reading the code.
 
why there is an increase in execution time with parallel degree 6 as compared to 2 or 4?

Those small changes seem to be perfectly compatible with random noise.  You would need to repeat them dozens of times in random order, and then do a statistical test to convince me otherwise.
 

Re: Parallel hints in PostgreSQL with consistent perfromance

От
mohini mane
Дата:


On Fri, Dec 29, 2023 at 10:25 AM Jeff Janes <jeff.janes@gmail.com> wrote:


On Thu, Dec 28, 2023 at 7:47 AM mohini mane <mohini.android@gmail.com> wrote:
Thank you for your response !!
I am experimenting with SQL query performance for SELECT queries on large tables and I observed that changing/increasing the degree of parallel hint doesn't give the expected performance improvement.

But you still have addressed the fact that PostgreSQL does not have planner hints.

Are you using some nonstandard extension, or nonstandard fork?
 >> I am using pg_hint_plan extension to enforce the parallel execution of specific table .  
         postgres=# load 'pg_hint_plan';
         LOAD
 
I have executed the SELECT query with 2,4 & 6 parallel degree however every time only 4 workers launched & there was a slight increase in Execution time as well,

Adding an ignored comment to your SQL would not be expected to do anything.  So it is not surprising that it does not do anything about the number of workers launched.  It is just a comment.  A note to the human who is reading the code.
 >> As I am using ph_hint_plan extension so as expected hints should not get ignored by the optimizer .
why there is an increase in execution time with parallel degree 6 as compared to 2 or 4?

Those small changes seem to be perfectly compatible with random noise.  You would need to repeat them dozens of times in random order, and then do a statistical test to convince me otherwise.
 >> I am expecting desired number of parallel workers should get allocated as VM has sufficient vCores [16] and with needed session parameters [parallel_tuple_cost=0.1,max_parallel_workers_per_gather=6,max_parallel_workers=8 and I am using parallel hints like this :  /*+ PARALLEL(A 5 hard) */  so 5 worker processes should launched this is not happening]

Re: Parallel hints in PostgreSQL with consistent perfromance

От
"David G. Johnston"
Дата:
On Tue, Jan 2, 2024 at 8:12 AM mohini mane <mohini.android@gmail.com> wrote:

I have executed the SELECT query with 2,4 & 6 parallel degree however every time only 4 workers launched & there was a slight increase in Execution time as well,

Adding an ignored comment to your SQL would not be expected to do anything.  So it is not surprising that it does not do anything about the number of workers launched.  It is just a comment.  A note to the human who is reading the code.
 >> As I am using ph_hint_plan extension so as expected hints should not get ignored by the optimizer .

Sounds like a bug you should go tell the pg_hint_plan authors about then.

David J.

Re: Parallel hints in PostgreSQL with consistent perfromance

От
mohini mane
Дата:


On Tue, 2 Jan 2024, 21:45 David G. Johnston, <david.g.johnston@gmail.com> wrote:
On Tue, Jan 2, 2024 at 8:12 AM mohini mane <mohini.android@gmail.com> wrote:

I have executed the SELECT query with 2,4 & 6 parallel degree however every time only 4 workers launched & there was a slight increase in Execution time as well,

Adding an ignored comment to your SQL would not be expected to do anything.  So it is not surprising that it does not do anything about the number of workers launched.  It is just a comment.  A note to the human who is reading the code.
 >> As I am using pg_hint_plan extension so as expected hints should not get ignored by the optimizer .

Sounds like a bug you should go tell the pg_hint_plan authors about then.
    >> I am getting same results with or without extension [in my case it's pg_hint_plan] still I will check with the respective team, Thanks .

David J.

Re: Parallel hints in PostgreSQL with consistent perfromance

От
Jeff Janes
Дата:
 
1st time query executed with PARALLEL DEGREE 2 
explain analyze select /*+ PARALLEL(A 2) */ * from test_compare_all_col_src1 A;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.00..45524.73 rows=949636 width=97) (actual time=0.673..173.017 rows=955000 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on test_compare_all_col_src1 a  (cost=0.00..44565.09 rows=237409 width=97) (actual time=0.039..51.941 rows=191000 loops=5)
 Planning Time: 0.093 ms
 Execution Time: 209.745 ms
(6 rows)

Your alias is not enclosed in double quotes, so it is downcased to "a" (as can be seen from the alias printed in the plan).  But pg_hint_plan hints don't follow the downcasing convention, so the hint on "A" does not match the alias "a", and so is ignored.
 
Cheers,

Jeff

Re: Parallel hints in PostgreSQL with consistent perfromance

От
mohini mane
Дата:


On Thu, Jan 4, 2024 at 7:13 AM Jeff Janes <jeff.janes@gmail.com> wrote:
 
1st time query executed with PARALLEL DEGREE 2 
explain analyze select /*+ PARALLEL(A 2) */ * from test_compare_all_col_src1 A;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.00..45524.73 rows=949636 width=97) (actual time=0.673..173.017 rows=955000 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on test_compare_all_col_src1 a  (cost=0.00..44565.09 rows=237409 width=97) (actual time=0.039..51.941 rows=191000 loops=5)
 Planning Time: 0.093 ms
 Execution Time: 209.745 ms
(6 rows)

Your alias is not enclosed in double quotes, so it is downcased to "a" (as can be seen from the alias printed in the plan).  But pg_hint_plan hints don't follow the downcasing convention, so the hint on "A" does not match the alias "a", and so is ignored.
  >> Thanks Jeff for the response !   It worked with "A" alias 
Cheers,

Jeff