Обсуждение: [HACKERS] Parallel Index-only scan

Поиск
Список
Период
Сортировка

[HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:
Hello all,
This is to propose a patch for enabling parallel index-only scans. With the patch of parallel index scan around [1], adding the mechanism for parallelising index-only scans makes sense. Without this mechanism for the queries preferring index-only scans, it is likely that at higher scale parallel index or parallel seq scan serve as cheaper alternative to index-only scans and then query performance might suffer because of the added processing of index or seq scans. 

Performance
-----------------
Consider the performance of a simple query on TPC-H schema, 

explain analyse select count(*) from lineitem where l_shipdate < date '1995-01-03';

Without parallel index-only scan, parallel seq scan got picked and it took around 23 seconds for the query to execute,

Finalize Aggregate  (cost=651586.63..651586.64 rows=1 width=8) (actual time=22853.872..22853.873 rows=1 loops=1)
   ->  Gather  (cost=651586.21..651586.62 rows=4 width=8) (actual time=22853.684..22853.864 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=650586.21..650586.22 rows=1 width=8) (actual time=22850.489..22850.489 rows=1 loops=5)
               ->  Parallel Seq Scan on lineitem  (cost=0.00..618021.73 rows=13025795 width=0) (actual time=0.035..20553.495 rows=10342437 loops=5)
                     Filter: (l_shipdate < '1995-01-03'::date)
                     Rows Removed by Filter: 13656485
 Planning time: 0.225 ms
 Execution time: 22855.196 ms

However, with parallel index-only scan, it took only 8.5 seconds,

Finalize Aggregate  (cost=568883.69..568883.70 rows=1 width=8) (actual time=8548.993..8548.993 rows=1 loops=1)
   ->  Gather  (cost=568883.27..568883.68 rows=4 width=8) (actual time=8548.789..8548.976 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=567883.27..567883.28 rows=1 width=8) (actual time=8541.929..8541.929 rows=1 loops=5)
               ->  Parallel Index Only Scan using idx_l_shipdate on lineitem  (cost=0.57..535318.78 rows=13025795 width=0) (actual time=0.113..5866.729 rows=10342437 loops=5)
                     Index Cond: (l_shipdate < '1995-01-03'::date)
                     Heap Fetches: 0
 Planning time: 0.266 ms
 Execution time: 8569.735 ms

The effect of parallel index-only scan can be seen more in some more complex queries where parallelism is enabled till top of the tree, e.g, following query takes 118 ms on head,

explain analyse select sum(l_extendedprice * l_discount) as revenue, avg(o_orderkey) from orders, lineitem where o_orderkey < 10000 and o_orderkey = l_orderkey group by l_shipmode order by revenue

 Sort  (cost=24396.44..24396.45 rows=7 width=75) (actual time=118.823..118.825 rows=7 loops=1)
   Sort Key: (sum((lineitem.l_extendedprice * lineitem.l_discount)))
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=24396.23..24396.34 rows=7 width=75) (actual time=118.749..118.786 rows=7 loops=1)
         Group Key: lineitem.l_shipmode
         ->  Nested Loop  (cost=1.13..24293.11 rows=10312 width=27) (actual time=0.096..73.198 rows=9965 loops=1)
               ->  Index Only Scan using orders_pkey on orders  (cost=0.56..46.48 rows=2578 width=4) (actual time=0.072..1.663 rows=2503 loops=1)
                     Index Cond: (o_orderkey < 10000)
                     Heap Fetches: 0
               ->  Index Scan using idx_lineitem_orderkey on lineitem  (cost=0.57..6.45 rows=296 width=31) (actual time=0.018..0.023 rows=4 loops=2503)
                     Index Cond: (l_orderkey = orders.o_orderkey)
 Planning time: 1.062 ms
 Execution time: 118.977 ms

With parallel index-only scan, the performance improves to 40 ms,

Sort  (cost=7191.33..7191.35 rows=7 width=75) (actual time=40.475..40.476 rows=7 loops=1)
   Sort Key: (sum((lineitem.l_extendedprice * lineitem.l_discount)))
   Sort Method: quicksort  Memory: 25kB
   ->  Finalize GroupAggregate  (cost=7190.78..7191.23 rows=7 width=75) (actual time=40.168..40.451 rows=7 loops=1)
         Group Key: lineitem.l_shipmode
         ->  Sort  (cost=7190.78..7190.85 rows=28 width=75) (actual time=40.105..40.127 rows=35 loops=1)
               Sort Key: lineitem.l_shipmode
               Sort Method: quicksort  Memory: 29kB
               ->  Gather  (cost=7187.22..7190.11 rows=28 width=75) (actual time=39.344..39.983 rows=35 loops=1)
                     Workers Planned: 4
                     Workers Launched: 4
                     ->  Partial HashAggregate  (cost=6187.22..6187.31 rows=7 width=75) (actual time=25.981..26.011 rows=7 loops=5)
                           Group Key: lineitem.l_shipmode
                           ->  Nested Loop  (cost=1.13..6084.10 rows=10312 width=27) (actual time=0.139..16.352 rows=1993 loops=5)
                                 ->  Parallel Index Only Scan using orders_pkey on orders  (cost=0.56..27.14 rows=644 width=4) (actual time=0.082..0.366 rows=501 loops=5)
                                       Index Cond: (o_orderkey < 10000)
                                       Heap Fetches: 0
                                 ->  Index Scan using idx_lineitem_orderkey on lineitem  (cost=0.57..6.45 rows=296 width=31) (actual time=0.020..0.025 rows=4 loops=2503)
                                       Index Cond: (l_orderkey = orders.o_orderkey)
 Planning time: 1.170 ms
 Execution time: 40.898 ms 

Test configuration and machine detail:
--------------------------------------------------
TPC-H: scale facto                         : 20
work_mem                                      : 64MB
shared buffer                                   : 1GB
max_parallel_workers_per_gather  : 4
Machine                                           : IBM POWER, 4 socket machine with 512 GB RAM.
random_page_cost = seq_page_cost = 0.1
We kept random and seq page cost same since warm cache environment was ensured, thus, keeping a value of random_page_cost that is higher than seq_page_cost doesn't makes much sense in this setup.
Also, some additional indexes were build, for lineitem table l_shipdate, l_shipmode, and l_returnflag, on orders table index is added for o_comment and o_orderdate individually.

The point to note here is that with parallel index-only scan, not only the heap fetches are saved but also the aggregates/sort can be pushed down to workers and thus the total time of query can be improved. Clearly, the performance of queries improved significantly with this new operator and considering the changes required after parallel index scan patches is less if evaluated against the improvement in performance it offers.

Attached file:
------------------
1. parallel_index_only_v1.patch

This patch is to be applied over parallel index scan [1].

* Thanks to my colleagues Dilip Kumar and Amit Kapila for their support and feedback.

--
Regards,
Rafia Sabih
Вложения

Re: [HACKERS] Parallel Index-only scan

От
"rafia.sabih"
Дата:
Please find the attached file for the latest version of patch.

parallel_index_only_v2.patch
<http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>  



--
View this message in context: http://postgresql.nabble.com/Parallel-Index-only-scan-tp5934352p5936010.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: [HACKERS] Parallel Index-only scan

От
Robert Haas
Дата:
On Fri, Dec 23, 2016 at 12:04 AM, rafia.sabih
<rafia.sabih@enterprisedb.com> wrote:
> Please find the attached file for the latest version of patch.
>
> parallel_index_only_v2.patch
> <http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>

We want to have the patch actually attached, not just stored on nabble.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Parallel Index-only scan

От
Alvaro Herrera
Дата:
Robert Haas wrote:
> On Fri, Dec 23, 2016 at 12:04 AM, rafia.sabih
> <rafia.sabih@enterprisedb.com> wrote:
> > Please find the attached file for the latest version of patch.
> >
> > parallel_index_only_v2.patch
> > <http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>
> 
> We want to have the patch actually attached, not just stored on nabble.

I think the problem is that Nabble removes the attachment before
resending the email to the list.  The OP should be posting directly
instead of via Nabble.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Parallel Index-only scan

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Dec 23, 2016 at 12:04 AM, rafia.sabih
> <rafia.sabih@enterprisedb.com> wrote:
>> Please find the attached file for the latest version of patch.
>> parallel_index_only_v2.patch
>> <http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>

> We want to have the patch actually attached, not just stored on nabble.

Or in words of one syllable: please do not use nabble to post to the
community mailing lists.  Their habit of stripping attachments is not
the only evil thing about them (although it's the only one I remember
details of ATM).
        regards, tom lane



Re: [HACKERS] Parallel Index-only scan

От
Robert Haas
Дата:
On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Or in words of one syllable: please do not use nabble to post to the
> community mailing lists.

Many of those words have two syllables, and one has four.

Anyhow, I think three castigating emails from committers in a span of
62 minutes is probably enough for the OP to get the point, unless
someone else REALLY feels the need to pile on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:
Extremely sorry for the inconvenience caused, please find the attached file for the latest version of the patch.

On Sat, Dec 24, 2016 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Or in words of one syllable: please do not use nabble to post to the
> community mailing lists.

Many of those words have two syllables, and one has four.

Anyhow, I think three castigating emails from committers in a span of
62 minutes is probably enough for the OP to get the point, unless
someone else REALLY feels the need to pile on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Regards,
Rafia Sabih
Вложения

Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:
Rebased patch of parallel-index only scan based on the latest version of parallel index scan [1] is attached.


On Sat, Dec 24, 2016 at 7:55 PM, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:
Extremely sorry for the inconvenience caused, please find the attached file for the latest version of the patch.

On Sat, Dec 24, 2016 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Or in words of one syllable: please do not use nabble to post to the
> community mailing lists.

Many of those words have two syllables, and one has four.

Anyhow, I think three castigating emails from committers in a span of
62 minutes is probably enough for the OP to get the point, unless
someone else REALLY feels the need to pile on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Regards,
Rafia Sabih



--
Regards,
Rafia Sabih
Вложения

Re: [HACKERS] Parallel Index-only scan

От
Amit Kapila
Дата:
On Wed, Dec 28, 2016 at 12:18 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
> Rebased patch of parallel-index only scan based on the latest version of
> parallel index scan [1] is attached.
>
> [1]
> https://www.postgresql.org/message-id/CAA4eK1LiNi7_Z1%2BPCV4y06o_v%3DZdZ1UThE%2BW9JhthX4B8uifnA%40mail.gmail.com
>

The link for the latest version is wrong, the correct link is:
https://www.postgresql.org/message-id/CAA4eK1KthrAvNjmB2cWuUHz%2Bp3ZTTtbD7o2KUw49PC8HK4r1Wg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Parallel Index-only scan

От
Rahila Syed
Дата:
Hello,

On applying the patch on latest master branch and running regression tests following failure occurs.
I applied it on latest parallel index scan patches as given in the link above.

*** /home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out    2017-01-03 14:06:29.122022780 +0530
--- /home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out 2017-01-12 14:35:56.652712622 +0530
***************
*** 92,103 ****
  explain (costs off)
    select  sum(parallel_restricted(unique1)) from tenk1
    group by(parallel_restricted(unique1));
!                      QUERY PLAN                    
! ----------------------------------------------------
   HashAggregate
     Group Key: parallel_restricted(unique1)
!    ->  Index Only Scan using tenk1_unique1 on tenk1
! (3 rows)

  set force_parallel_mode=1;
  explain (costs off)
--- 92,105 ----
  explain (costs off)
    select  sum(parallel_restricted(unique1)) from tenk1
    group by(parallel_restricted(unique1));
!                             QUERY PLAN                            
! -------------------------------------------------------------------
   HashAggregate
     Group Key: parallel_restricted(unique1)
!    ->  Gather
!          Workers Planned: 4
!          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
! (5 rows)

IIUC, parallel operation being performed here is fine as parallel restricted function occurs above Gather node
and just the expected output needs to be changed.

Thank you,
Rahila Syed

Вложения

Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:
On Thu, Jan 12, 2017 at 5:39 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Hello,
>
> On applying the patch on latest master branch and running regression tests
> following failure occurs.
> I applied it on latest parallel index scan patches as given in the link
> above.
>
> ***
> /home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out
> 2017-01-03 14:06:29.122022780 +0530
> ---
> /home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out
> 2017-01-12 14:35:56.652712622 +0530
> ***************
> *** 92,103 ****
>   explain (costs off)
>     select  sum(parallel_restricted(unique1)) from tenk1
>     group by(parallel_restricted(unique1));
> !                      QUERY PLAN
> ! ----------------------------------------------------
>    HashAggregate
>      Group Key: parallel_restricted(unique1)
> !    ->  Index Only Scan using tenk1_unique1 on tenk1
> ! (3 rows)
>
>   set force_parallel_mode=1;
>   explain (costs off)
> --- 92,105 ----
>   explain (costs off)
>     select  sum(parallel_restricted(unique1)) from tenk1
>     group by(parallel_restricted(unique1));
> !                             QUERY PLAN
> ! -------------------------------------------------------------------
>    HashAggregate
>      Group Key: parallel_restricted(unique1)
> !    ->  Gather
> !          Workers Planned: 4
> !          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
> ! (5 rows)
>
> IIUC, parallel operation being performed here is fine as parallel restricted
> function occurs above Gather node
> and just the expected output needs to be changed.
>

True, fixed it, please find the attached file for the latest version.

> Thank you,
> Rahila Syed
>
Thanks a lot for the review Rahila.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:
On Fri, Jan 13, 2017 at 2:19 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
> On Thu, Jan 12, 2017 at 5:39 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> Hello,
>>
>> On applying the patch on latest master branch and running regression tests
>> following failure occurs.
>> I applied it on latest parallel index scan patches as given in the link
>> above.
>>
>> ***
>> /home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out
>> 2017-01-03 14:06:29.122022780 +0530
>> ---
>> /home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out
>> 2017-01-12 14:35:56.652712622 +0530
>> ***************
>> *** 92,103 ****
>>   explain (costs off)
>>     select  sum(parallel_restricted(unique1)) from tenk1
>>     group by(parallel_restricted(unique1));
>> !                      QUERY PLAN
>> ! ----------------------------------------------------
>>    HashAggregate
>>      Group Key: parallel_restricted(unique1)
>> !    ->  Index Only Scan using tenk1_unique1 on tenk1
>> ! (3 rows)
>>
>>   set force_parallel_mode=1;
>>   explain (costs off)
>> --- 92,105 ----
>>   explain (costs off)
>>     select  sum(parallel_restricted(unique1)) from tenk1
>>     group by(parallel_restricted(unique1));
>> !                             QUERY PLAN
>> ! -------------------------------------------------------------------
>>    HashAggregate
>>      Group Key: parallel_restricted(unique1)
>> !    ->  Gather
>> !          Workers Planned: 4
>> !          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
>> ! (5 rows)
>>
>> IIUC, parallel operation being performed here is fine as parallel restricted
>> function occurs above Gather node
>> and just the expected output needs to be changed.
>>
>
> True, fixed it, please find the attached file for the latest version.
>

Please find the attached file rebased patch of parallel index-only
scan on the latest Parallel index-scan patch [1].

[1] https://www.postgresql.org/message-id/CAA4eK1L-gb0Fum3mQN4c5PWJXNE7xs7pzwMDWsrDYLucKqvJ2A%40mail.gmail.com
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Parallel Index-only scan

От
tushar
Дата:
On 01/19/2017 05:37 PM, Rafia Sabih wrote:
> Please find the attached file rebased patch of parallel index-only
> scan on the latest Parallel index-scan patch [1].
We did some testing of  this feature and written few testcases.  PFA the 
sql script(along with the expected .out files)

In addition we have generated the LCOV (code coverage) report and 
compared the files which are changed for this.
You can see the numbers for  "with_patch" V/s "with_patch+TestCases"  
(.pdf file is attached)

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Parallel Index-only scan

От
Michael Paquier
Дата:
On Thu, Jan 19, 2017 at 9:07 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
> Please find the attached file rebased patch of parallel index-only
> scan on the latest Parallel index-scan patch [1].

Moved to CF 2017-03.
-- 
Michael



Re: [HACKERS] Parallel Index-only scan

От
Robert Haas
Дата:
On Thu, Jan 19, 2017 at 7:07 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
> Please find the attached file rebased patch of parallel index-only
> scan on the latest Parallel index-scan patch [1].

This again needs minor rebasing but basically looks fine.  It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch.  Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there.  In particular,
he's got this:
               if (reset_parallel_scan && node->iss_ScanDesc->parallel_scan)
index_parallelrescan(node->iss_ScanDesc);

And you've got this:

+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);

There might be some other inconsistencies as well that I didn't notice
on a quick look.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Parallel Index-only scan

От
Rahila Syed
Дата:
I  reviewed the patch. Overall it looks fine to me.

One comment,

>-       if (index->amcanparallel &&
>-           !index_only_scan &&
>+       if ((index->amcanparallel ||
>+           index_only_scan) &&

Why do we need to check for index_only_scan in the above condition. IIUC, index->amcanparallel is necessary for
parallel index scan to be chosen. We cannot chose parallel index only scan if index_only_scan is happening
without worrying about index->amcanparallel value. So OR-ing index->amcanparallel with index_only_scan is probably not
correct.

Thank you,
Rahila Syed




On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 19, 2017 at 7:07 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
> Please find the attached file rebased patch of parallel index-only
> scan on the latest Parallel index-scan patch [1].

This again needs minor rebasing but basically looks fine.  It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch.  Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there.  In particular,
he's got this:

                if (reset_parallel_scan && node->iss_ScanDesc->parallel_scan)
                        index_parallelrescan(node->iss_ScanDesc);

And you've got this:

+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);

There might be some other inconsistencies as well that I didn't notice
on a quick look.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:

On Thu, Feb 16, 2017 at 1:26 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
I  reviewed the patch. Overall it looks fine to me.

One comment,

>-       if (index->amcanparallel &&
>-           !index_only_scan &&
>+       if ((index->amcanparallel ||
>+           index_only_scan) &&

Why do we need to check for index_only_scan in the above condition. IIUC, index->amcanparallel is necessary for
parallel index scan to be chosen. We cannot chose parallel index only scan if index_only_scan is happening
without worrying about index->amcanparallel value. So OR-ing index->amcanparallel with index_only_scan is probably not
correct.

True, we do not need this, only removing !index_only_scan should work. Fixed


On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:

This again needs minor rebasing but basically looks fine.  It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch.  Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there.  In particular,
he's got this:

                if (reset_parallel_scan && node->iss_ScanDesc->parallel_scan)
                        index_parallelrescan(node->iss_ScanDesc);

And you've got this:

+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);
 
Fixed.
Please find the attached patch for rebased and cleaner version.

--
Regards,
Rafia Sabih
Вложения

Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:


On Thu, Feb 16, 2017 at 3:40 PM, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:

On Thu, Feb 16, 2017 at 1:26 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
I  reviewed the patch. Overall it looks fine to me.

One comment,

>-       if (index->amcanparallel &&
>-           !index_only_scan &&
>+       if ((index->amcanparallel ||
>+           index_only_scan) &&

Why do we need to check for index_only_scan in the above condition. IIUC, index->amcanparallel is necessary for
parallel index scan to be chosen. We cannot chose parallel index only scan if index_only_scan is happening
without worrying about index->amcanparallel value. So OR-ing index->amcanparallel with index_only_scan is probably not
correct.

True, we do not need this, only removing !index_only_scan should work. Fixed


On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:

This again needs minor rebasing but basically looks fine.  It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch.  Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there.  In particular,
he's got this:

                if (reset_parallel_scan && node->iss_ScanDesc->parallel_scan)
                        index_parallelrescan(node->iss_ScanDesc);

And you've got this:

+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);
 
Fixed.
Please find the attached patch for rebased and cleaner version.

Please find the attached patch with a minor comment update.

--
Regards,
Rafia Sabih
Вложения

Re: [HACKERS] Parallel Index-only scan

От
Amit Kapila
Дата:
On Thu, Feb 16, 2017 at 3:57 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
>
>
> On Thu, Feb 16, 2017 at 3:40 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
> wrote:
>>
>>
>> Please find the attached patch for rebased and cleaner version.
>>
> Please find the attached patch with a minor comment update.
>

Few comments:

1.
+ * ioss_PscanLen   This is needed for parallel index scan * ---------------- */typedef struct IndexOnlyScanState
@@ -1427,6 +1428,7 @@ typedef struct IndexOnlyScanState IndexScanDesc ioss_ScanDesc; Buffer ioss_VMBuffer; long
ioss_HeapFetches;
+ Size ioss_PscanLen; /* This is needed for parallel index scan */

No need to mention same comment at multiple places.  I think you keep
it on top of structure.  The explanation could be some thing like
"size of parallel index only scan descriptor"

2.
+ node->ioss_ScanDesc->xs_want_itup = true;

I think wherever you are initializing xs_want_itup, you should
initialize ioss_VMBuffer as well.  Is there a reason for not doing so?


3.
explain (costs off) select  sum(parallel_restricted(unique1)) from tenk1 group by(parallel_restricted(unique1));
-                     QUERY PLAN
-----------------------------------------------------
+                            QUERY PLAN
+------------------------------------------------------------------- HashAggregate   Group Key:
parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
+(5 rows)

It doesn't look good that you want to test parallel index only scan
for a test case that wants to test restricted function.  The comments
atop test looks odd. I suggest add a separate test (both explain and
actual execution of query) for parallel index only scan as we have for
parallel plans for other queries and see if we can keep the output of
existing test same.

4.
ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
{
..
+ /*
+ * if we are here to just update the scan keys, then don't reset parallel
+ * scan
+ */
+ if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
+ reset_parallel_scan = false;
..
}

I think here you can update the comment to indicate that for detailed
reason refer ExecReScanIndexScan.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:


On Thu, Feb 16, 2017 at 9:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Few comments:

1.
+ * ioss_PscanLen   This is needed for parallel index scan
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1427,6 +1428,7 @@ typedef struct IndexOnlyScanState
  IndexScanDesc ioss_ScanDesc;
  Buffer ioss_VMBuffer;
  long ioss_HeapFetches;
+ Size ioss_PscanLen; /* This is needed for parallel index scan */

No need to mention same comment at multiple places.  I think you keep
it on top of structure.  The explanation could be some thing like
"size of parallel index only scan descriptor"

Fixed. 

2.
+ node->ioss_ScanDesc->xs_want_itup = true;

I think wherever you are initializing xs_want_itup, you should
initialize ioss_VMBuffer as well.  Is there a reason for not doing so?

Done. 


3.
explain (costs off)
  select  sum(parallel_restricted(unique1)) from tenk1
  group by(parallel_restricted(unique1));
-                     QUERY PLAN
-----------------------------------------------------
+                            QUERY PLAN
+-------------------------------------------------------------------
  HashAggregate
    Group Key: parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
+(5 rows)

It doesn't look good that you want to test parallel index only scan
for a test case that wants to test restricted function.  The comments
atop test looks odd. I suggest add a separate test (both explain and
actual execution of query) for parallel index only scan as we have for
parallel plans for other queries and see if we can keep the output of
existing test same.
 
Agree, but actually the objective of this test-case is met even with this plan. To restrict parallel index-only scan here, modification in query or other parameters would be required. However, for the proper code-coverage and otherwise I have added test-case for parallel index-only scan.

4.
ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
{
..
+ /*
+ * if we are here to just update the scan keys, then don't reset parallel
+ * scan
+ */
+ if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
+ reset_parallel_scan = false;
..
}

I think here you can update the comment to indicate that for detailed
reason refer ExecReScanIndexScan.

Done. 
Please find the attached patch for the revised version.
Just an FYI, in my recent tests on TPC-H 300 scale factor, Q16 showed improved execution time from 830 seconds to 730 seconds with this patch when used with parallel merge-join patch [1].

Regards,
Rafia Sabih
Вложения

Re: [HACKERS] Parallel Index-only scan

От
Amit Kapila
Дата:
On Fri, Feb 17, 2017 at 10:35 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
>
>
> On Thu, Feb 16, 2017 at 9:25 PM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>>
>> 4.
>> ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
>> {
>> ..
>> + /*
>> + * if we are here to just update the scan keys, then don't reset parallel
>> + * scan
>> + */
>> + if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
>> + reset_parallel_scan = false;
>> ..
>> }
>>
>> I think here you can update the comment to indicate that for detailed
>> reason refer ExecReScanIndexScan.
>
>
> Done.

+       /*
+        * If we are here to just update the scan keys, then don't
reset parallel
+        * scan. For detailed reason behind this look in the comments for
+        * ExecReScanIndexScan.
+        */

You can phrase the second line as "See ExecReScanIndexScan for
details.".  Apart from that this patch looks good to me.  I have
marked this patch as "Ready For Committer".

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Parallel Index-only scan

От
Robert Haas
Дата:
On Sat, Feb 18, 2017 at 12:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> +       /*
> +        * If we are here to just update the scan keys, then don't
> reset parallel
> +        * scan. For detailed reason behind this look in the comments for
> +        * ExecReScanIndexScan.
> +        */
>
> You can phrase the second line as "See ExecReScanIndexScan for
> details.".  Apart from that this patch looks good to me.  I have
> marked this patch as "Ready For Committer".

Committed, although I neglected to incorporate this change.  Not sure
if I should go back and do that; it doesn't read too badly as-is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Parallel Index-only scan

От
Rafia Sabih
Дата:
On Sun, Feb 19, 2017 at 4:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> Committed, although I neglected to incorporate this change.  Not sure
> if I should go back and do that; it doesn't read too badly as-is.
>
Thanks Robert for committing, and thanks Rahila, Amit, and Tushar for
reviewing and testing the patch.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/