Обсуждение: BUG #15577: Query returns different results when executed multipletimes

Поиск
Список
Период
Сортировка

BUG #15577: Query returns different results when executed multipletimes

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15577
Logged by:          Bartosz Polnik
Email address:      bartoszpolnik@gmail.com
PostgreSQL version: 11.1
Operating system:   Windows 10
Description:

Hello,

A couple of days ago I stumbled upon a query that doesn't return all rows it
should. When we execute it multiple times, we get only a subset of data. The
query is:

SELECT ta.id AS table_a_id,
       tc.id as table_c_id,
       tba.id AS table_b_id
FROM test.table_b_active tba
       INNER JOIN test.table_c tc ON tba.target_id = tc.id
       INNER JOIN test.table_d td ON tc.table_d_id = td.id
       LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
  AND td.group = 'A'
  AND tc.table_e_id = 4
  AND (
    (tba.target_id = tc.id AND tba.group_type = 'A')
    OR tba.source_id = tc.id
  );

We always expect 31 rows in return:

table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    2913754
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16116256 |     293541 |    2997160
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16116069 |     351585 |    3025941
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16115106 |     350047 |    2902075
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2998909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16116260 |     290583 |    2955483
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
   16116241 |     295971 |    3038921
   16116249 |     296708 |    3038888

But when we run it using psql, we get different results (not all rows are
returned). Here's the query I used to get the execution plan and example
output from two consecutive plan executions:

explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );


Nested Loop Left Join  (actual time=157.436..329.538 rows=21 loops=1)
  ->  Gather  (actual time=157.407..349.743 rows=21 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Nested Loop (actual time=167.978..281.540 rows=11 loops=2)
              ->  Hash Join  (actual time=0.300..18.206 rows=945 loops=2)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.107..17.326 rows=1751 loops=2)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 49117
                    ->  Hash (actual time=0.065..0.065 rows=8 loops=2)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d td  (actual
time=0.043..0.049 rows=8 loops=2)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.273..0.278 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.019..0.019 rows=1 loops=21)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.218 ms
Execution Time: 350.283 ms

Above we only got 21 rows, instead of 31.

Nested Loop Left Join (actual time=164.189..358.068 rows=28 loops=1)
  ->  Gather  (actual time=164.140..370.034 rows=28 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Nested Loop  (actual time=188.718..306.289 rows=14 loops=2)
              ->  Hash Join (actual time=0.321..21.860 rows=945 loops=2)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.113..20.784 rows=1751 loops=2)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 49117
                    ->  Hash (actual time=0.063..0.063 rows=8 loops=2)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d td (actual
time=0.040..0.049 rows=8 loops=2)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.294..0.300 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.031..0.031 rows=1 loops=28)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 0.971 ms
Execution Time: 371.202 ms

Here we only got 28 rows, instead of 31.

Used indices:
create index table_b_idx_target_id
   on test.table_b (target_id)
   where (t_active IS TRUE);

create unique index table_a_uq_001
  on table_a (table_c_id, date);

And views:
create view test.table_b_active as
SELECT *
FROM test.table_b
WHERE (table_b.t_active IS TRUE);

I made a couple of tests and couldn't reproduce this problem with
max_parallel_workers=0. With this parameter set, we get the execution
plan:

Nested Loop Left Join (actual time=212.010..509.274 rows=31 loops=1)
  ->  Gather (actual time=211.976..508.855 rows=31 loops=1)
        Workers Planned: 1
        Workers Launched: 0
        ->  Nested Loop (actual time=211.593..508.425 rows=31 loops=1)
              ->  Hash Join (actual time=0.063..31.628 rows=1890 loops=1)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.042..30.000 rows=3501 loops=1)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 98233
                    ->  Hash (actual time=0.016..0.016 rows=8 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d_id td (actual
time=0.008..0.012 rows=8 loops=1)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.248..0.251 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.014..0.014 rows=1 loops=31)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.135 ms
Execution Time: 509.431 ms

Nr of rows returned is correct - 31.

I tried to create a clean schema with test data, but couldn't get the same
execution plan, so I can't include that. 
The data above comes from db snapshot with table, index and view names
changed by hand, so there might be some typos. Sorry about them.

I'm using PostgreSQL 11.1, compiled by Visual C++ build 1914, 64-bit.

Best regards,
Bartosz Polnik


Re: BUG #15577: Query returns different results when executedmultiple times

От
David Rowley
Дата:
On Mon, 7 Jan 2019 at 05:46, PG Bug reporting form
<noreply@postgresql.org> wrote:
> A couple of days ago I stumbled upon a query that doesn't return all rows it
> should.

>         ->  Nested Loop (actual time=167.978..281.540 rows=11 loops=2)

>         ->  Nested Loop  (actual time=188.718..306.289 rows=14 loops=2)

Above is the first place where the actual row counts diverge. This
makes it appear that it's the join to table_b that's causing the
variation. The parallel worker is meant to be running with the same
snapshot as the main process so that it properly sees the same tuples
as visible. If for some reason that was not working correctly then
that might explain why you get differing results.

Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?

Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?

I'm able to produce the same plan as you're getting with the attached
script, but unable to see any sort of row variation. My row counts
don't match yours exactly. If I adjust the rows in table_b too much I
get a completely different plan.

I tested with PostgreSQL 11.1, compiled by Visual C++ build 1916, 64-bit.

> I tried to create a clean schema with test data, but couldn't get the same
> execution plan, so I can't include that.

If you're removing columns to simplify the test script then you made
need to add some columns back to pad the tuples out a bit in order to
keep the tables around the same size. Variations in the table size
will have an effect on the join costs and could change the join order.
Also, a smaller table_c may not receive a parallel seq scan. In my
mockup, I made table_b and table_a fairly large so as to keep the
joins as parameterized nested loops. I also left out indexes on the
"id" columns to reduce the chances of a Merge Join.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: BUG #15577: Query returns different results when executedmultiple times

От
Bartosz Polnik
Дата:
> Can you confirm if there are concurrent changes being made to table_b
> perhaps while the query is running?  

I confirm. There are no concurrent changes being made and I'm the only client connecting to that db.

> Do you still get the variation with an otherwise idle database with no
> open transactions that's just received a complete VACUUM?  

I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm still getting different results.

With your script, I'm always getting the same output, so it must be something else.

I thought about sharing db dump, but I can't do that due to the company policy. What I could share though, is anonymized db dump. Do you know if there is any good tool for performing anonymization?


On Mon, Jan 7, 2019 at 2:57 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Mon, 7 Jan 2019 at 05:46, PG Bug reporting form
<noreply@postgresql.org> wrote:
> A couple of days ago I stumbled upon a query that doesn't return all rows it
> should.

>         ->  Nested Loop (actual time=167.978..281.540 rows=11 loops=2)

>         ->  Nested Loop  (actual time=188.718..306.289 rows=14 loops=2)

Above is the first place where the actual row counts diverge. This
makes it appear that it's the join to table_b that's causing the
variation. The parallel worker is meant to be running with the same
snapshot as the main process so that it properly sees the same tuples
as visible. If for some reason that was not working correctly then
that might explain why you get differing results.

Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?

Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?

I'm able to produce the same plan as you're getting with the attached
script, but unable to see any sort of row variation. My row counts
don't match yours exactly. If I adjust the rows in table_b too much I
get a completely different plan.

I tested with PostgreSQL 11.1, compiled by Visual C++ build 1916, 64-bit.

> I tried to create a clean schema with test data, but couldn't get the same
> execution plan, so I can't include that.

If you're removing columns to simplify the test script then you made
need to add some columns back to pad the tuples out a bit in order to
keep the tables around the same size. Variations in the table size
will have an effect on the join costs and could change the join order.
Also, a smaller table_c may not receive a parallel seq scan. In my
mockup, I made table_b and table_a fairly large so as to keep the
joins as parameterized nested loops. I also left out indexes on the
"id" columns to reduce the chances of a Merge Join.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Mon, Jan 7, 2019 at 10:29 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
>
> > Can you confirm if there are concurrent changes being made to table_b
> > perhaps while the query is running?
>
> I confirm. There are no concurrent changes being made and I'm the only client connecting to that db.
>
> > Do you still get the variation with an otherwise idle database with no
> > open transactions that's just received a complete VACUUM?
>
> I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm still getting different results.
>
> With your script, I'm always getting the same output, so it must be something else.

What do you get if you run it with SET force_parallel_mode = on?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Bartosz Polnik
Дата:
I've modified the script:
1. I Added SET force_parallel_mode = on;:

SET force_parallel_mode = on;
explain (costs off, analyze)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id

2. I've changed group and group_type to be enums (we have them as enums in db):
drop table if exists table_c;
drop table if exists table_d;
drop type if exists "group";
drop type if exists "group_type";

create type group_type AS ENUM ('A', 'B');
create type "group" AS ENUM ('A', 'B');

Updated version of the script is in attachment.

When I execute it, I get:

c:\Program Files\PostgreSQL\11\bin>psql -U postgres -f "C:\mockup_bug15577.sql" test
DROP VIEW
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TYPE
DROP TYPE
CREATE TABLE
INSERT 0 20000
CREATE TYPE
CREATE TYPE
CREATE TABLE
INSERT 0 200000
CREATE TABLE
INSERT 0 101736
CREATE TABLE
INSERT 0 8
INSERT 0 55
CREATE INDEX
CREATE INDEX
CREATE VIEW
ANALYZE
SET
SET
ALTER TABLE
DROP VIEW
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TYPE
DROP TYPE
CREATE TYPE
CREATE TYPE
CREATE TABLE
INSERT 0 20000
CREATE TABLE
INSERT 0 200000
CREATE TABLE
INSERT 0 101736
CREATE TABLE
INSERT 0 8
INSERT 0 55
CREATE INDEX
CREATE INDEX
CREATE VIEW
ANALYZE
SET
                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=1.185..81.170 rows=112 loops=1)
   ->  Gather (actual time=1.179..95.573 rows=112 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=0.076..7.182 rows=56 loops=2)
               ->  Hash Join (actual time=0.050..4.592 rows=936 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.004..4.358 rows=1755 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49114
                     ->  Hash (actual time=0.019..0.020 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.011..0.014 rows=8 loops=1)
                                 Filter: ("group" = 'A'::"group")
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.003..0.003 rows=0 loops=1871)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 1
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.003..0.003 rows=0 loops=112)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 0.773 ms
 Execution Time: 96.037 ms
(23 rows)

Nr of rows doesn't change from execution to execution.



On Mon, Jan 7, 2019 at 1:20 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Mon, Jan 7, 2019 at 10:29 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
>
> > Can you confirm if there are concurrent changes being made to table_b
> > perhaps while the query is running?
>
> I confirm. There are no concurrent changes being made and I'm the only client connecting to that db.
>
> > Do you still get the variation with an otherwise idle database with no
> > open transactions that's just received a complete VACUUM?
>
> I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm still getting different results.
>
> With your script, I'm always getting the same output, so it must be something else.

What do you get if you run it with SET force_parallel_mode = on?

--
Thomas Munro
http://www.enterprisedb.com
Вложения

Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> I've modified the script:
> 1. I Added SET force_parallel_mode = on;:

What if you try that with your original query in the database that
shows the problem?  Or alternatively SET parallel_leader_participation
= on.  What I'm wondering is whether the parallel worker is
consistently getting the same result, but it's a different result from
the leader, and the variation comes from the race between leader and
worker, or whether the instability comes from somewhere else.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Bartosz Polnik
Дата:
For the following sql:

SET force_parallel_mode = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );


I got:

                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=145.151..284.612 rows=26 loops=1)
   ->  Gather (actual time=145.092..304.166 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=153.289..247.768 rows=13 loops=2)
               ->  Hash Join (actual time=0.221..15.028 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.085..14.442 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.044..0.045 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.242..0.246 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.016..0.016 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.024 ms
 Execution Time: 304.858 ms


On Second execution:
                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=228.405..279.383 rows=19 loops=1)
   ->  Gather (actual time=228.376..288.807 rows=19 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=150.785..242.440 rows=10 loops=2)
               ->  Hash Join (actual time=0.212..14.353 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.075..13.794 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.050..0.050 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d_id td (actual time=0.031..0.037 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.237..0.241 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=19)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.816 ms
 Execution Time: 289.503 ms


I executed it a couple of times and saw in output 26 rows (as above), 21 rows (also above), 28 rows, 28 rows, 29 rows etc. 
The changing parts (nr of rows) were: Nested Loop Left Join and Nested Loop.


For the following sql:

SET parallel_leader_participation = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

I received:

                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=143.805..279.354 rows=29 loops=1)
   ->  Gather (actual time=143.735..299.066 rows=29 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=149.334..242.108 rows=15 loops=2)
               ->  Hash Join (actual time=0.203..14.536 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.072..13.971 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.043..0.043 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.034 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.237..0.240 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.018..0.018 rows=1 loops=29)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.070 ms
 Execution Time: 299.849 ms



                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=231.714..276.770 rows=26 loops=1)
   ->  Gather (actual time=231.654..295.971 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=151.209..239.965 rows=13 loops=2)
               ->  Hash Join (actual time=0.214..14.467 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.070..13.923 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.045..0.045 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.235..0.238 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.017..0.017 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.976 ms
 Execution Time: 296.729 ms


I also made a test with:

SET parallel_leader_participation = off;
SET force_parallel_mode = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

The output is correct - 31 rows, but the plan is different:
                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather (actual time=263.880..507.668 rows=31 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
         ->  Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
               ->  Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 98233
                     ->  Hash (actual time=0.064..0.069 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
         ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
               Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.051 ms
 Execution Time: 507.858 ms


On Mon, Jan 7, 2019 at 8:59 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> I've modified the script:
> 1. I Added SET force_parallel_mode = on;:

What if you try that with your original query in the database that
shows the problem?  Or alternatively SET parallel_leader_participation
= on.  What I'm wondering is whether the parallel worker is
consistently getting the same result, but it's a different result from
the leader, and the variation comes from the race between leader and
worker, or whether the instability comes from somewhere else.

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Tue, Jan 8, 2019 at 10:39 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
>  Gather (actual time=263.880..507.668 rows=31 loops=1)
>    Workers Planned: 1
>    Workers Launched: 1
>    Single Copy: true
>    ->  Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
>          ->  Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
>                ->  Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
>                      Hash Cond: (tc.table_d_id = td.id)
>                      ->  Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
>                            Filter: (table_e_id = 4)
>                            Rows Removed by Filter: 98233
>                      ->  Hash (actual time=0.064..0.069 rows=8 loops=1)
>                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                            ->  Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
>                                  Filter: (group = 'A'::test.group)
>                                  Rows Removed by Filter: 55
>                ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
>                      Index Cond: (target_id = tc.id)
>                      Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id)
AND(group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
 
>                      Rows Removed by Filter: 26
>          ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
>                Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

Hmm.  So if the plan runs entirely in a worker, we get 31 rows.  If
the plan runs entirely in the leader, we get 31 rows.  But if it runs
in both, we get fewer.  In all cases the Hash Join produces 1890 rows,
and in all cases the Nested Loop probes table_b_idx_target_id for all
of them: "loops=1890".  But somehow the output of the Nested Loop is
not 31 rows (or 31 / 2) when divided over two processes.

As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")?  That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Bartosz Polnik
Дата:
For:

SET max_parallel_workers = 0;
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
       tc.id as table_c_id,
       tba.id AS table_b_id
FROM test.table_b_active tba
       INNER JOIN test.table_c tc ON tba.target_id = tc.id
       INNER JOIN test.table_d td ON tc.table_d_id = td.id
       LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
  AND td.group = 'A'
  AND tc.table_e_id = 4
  AND (
    (tba.target_id = tc.id AND tba.group_type = 'A')
    OR tba.source_id = tc.id
  );

SELECT ta.id AS table_a_id,
       tc.id as table_c_id,
       tba.id AS table_b_id
FROM test.table_b_active tba
       INNER JOIN test.table_c tc ON tba.target_id = tc.id
       INNER JOIN test.table_d td ON tc.table_d_id = td.id
       LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
  AND td.group = 'A'
  AND tc.table_e_id = 4
  AND (
    (tba.target_id = tc.id AND tba.group_type = 'A')
    OR tba.source_id = tc.id
  );

I got:

                                                                                                              QUERY PLAN                                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=196.969..433.340 rows=31 loops=1)
   ->  Gather (actual time=196.897..432.757 rows=31 loops=1)
         Workers Planned: 1
         Workers Launched: 0
         ->  Nested Loop (actual time=196.384..432.197 rows=31 loops=1)
               ->  Hash Join (actual time=0.104..24.477 rows=1890 loops=1)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.060..23.639 rows=3501 loops=1)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 98233
                     ->  Hash (actual time=0.022..0.023 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.009..0.013 rows=8 loops=1)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.212..0.215 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.019..0.019 rows=1 loops=31)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.084 ms
 Execution Time: 433.630 ms


 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16116256 |     293541 |    2997160
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16116069 |     351585 |    3025941
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16115106 |     350047 |    2902075
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16114811 |     298605 |    2998909
   16116260 |     290583 |    2955483
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
   16116241 |     295971 |    3038921
   16116249 |     296708 |    3038888
(31 rows)

When I use force_parallel_mode = on, parallel_leader_participation = off and removed max_parallel_workers line from the script (so that default applies), I got:

                                                                                                              QUERY PLAN                                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather (actual time=271.255..530.713 rows=31 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Nested Loop Left Join (actual time=200.434..437.321 rows=31 loops=1)
         ->  Nested Loop (actual time=200.126..436.547 rows=31 loops=1)
               ->  Hash Join (actual time=0.322..25.510 rows=1890 loops=1)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Seq Scan on table_c tc (actual time=0.141..24.501 rows=3501 loops=1)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 98233
                     ->  Hash (actual time=0.082..0.084 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.049..0.066 rows=8 loops=1)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.214..0.217 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
         ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.024..0.024 rows=1 loops=31)
               Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.641 ms
 Execution Time: 530.926 ms

 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16116256 |     293541 |    2997160
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16116069 |     351585 |    3025941
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16115106 |     350047 |    2902075
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16114811 |     298605 |    2998909
   16116260 |     290583 |    2955483
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
   16116241 |     295971 |    3038921
   16116249 |     296708 |    3038888
(31 rows)

Rows returned by two queries are the same.

On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 8, 2019 at 10:39 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
>  Gather (actual time=263.880..507.668 rows=31 loops=1)
>    Workers Planned: 1
>    Workers Launched: 1
>    Single Copy: true
>    ->  Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
>          ->  Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
>                ->  Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
>                      Hash Cond: (tc.table_d_id = td.id)
>                      ->  Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
>                            Filter: (table_e_id = 4)
>                            Rows Removed by Filter: 98233
>                      ->  Hash (actual time=0.064..0.069 rows=8 loops=1)
>                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                            ->  Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
>                                  Filter: (group = 'A'::test.group)
>                                  Rows Removed by Filter: 55
>                ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
>                      Index Cond: (target_id = tc.id)
>                      Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
>                      Rows Removed by Filter: 26
>          ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
>                Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

Hmm.  So if the plan runs entirely in a worker, we get 31 rows.  If
the plan runs entirely in the leader, we get 31 rows.  But if it runs
in both, we get fewer.  In all cases the Hash Join produces 1890 rows,
and in all cases the Nested Loop probes table_b_idx_target_id for all
of them: "loops=1890".  But somehow the output of the Nested Loop is
not 31 rows (or 31 / 2) when divided over two processes.

As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")?  That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Tue, Jan 8, 2019 at 12:04 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>> As a sanity check, can you please see if the run-only-in-leader case
>> (max_parallel_workers = 0, so that we see "Workers Launched: 0")
>> produces the *same* 31 rows as the run-only-in-worker case
>> (force_parallel_mode = on, so that we see "Single Copy: true")?  That
>> is, the actual values of those 31 rows, in particular the columns
>> coming from table_b.
>
> Rows returned by two queries are the same.

Can we please also see the actual output in the broken case, where it
runs in two processes and produces fewer than 31 rows?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Bartosz Polnik
Дата:
Query:

explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
     tc.id as table_c_id,
     tba.id AS table_b_id
FROM test.table_b_active tba
     INNER JOIN test.table_c tc ON tba.target_id = tc.id
     INNER JOIN test.table_d td ON tc.table_d_id = td.id
     LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date = '2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
  (tba.target_id = tc.id AND tba.group_type = 'A')
  OR tba.source_id = tc.id
);

SELECT ta.id AS table_a_id,
     tc.id as table_c_id,
     tba.id AS table_b_id
FROM test.table_b_active tba
     INNER JOIN test.table_c tc ON tba.target_id = tc.id
     INNER JOIN test.table_d td ON tc.table_d_id = td.id
     LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date = '2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
  (tba.target_id = tc.id AND tba.group_type = 'A')
  OR tba.source_id = tc.id
);


Output:
                                                                                                             QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=236.640..281.711 rows=26 loops=1)
   ->  Gather (actual time=236.578..304.194 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=154.947..244.097 rows=13 loops=2)
               ->  Hash Join (actual time=0.223..14.649 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.083..14.102 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.044..0.044 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.239..0.242 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.020..0.020 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.047 ms
 Execution Time: 304.960 ms


 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16116256 |     293541 |    2901938
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16114813 |     342353 |    3052371
   16114789 |     292051 |    3038539
   16116069 |     351585 |    3025941
   16115861 |     350487 |    2933633
   16114814 |     331329 |    2946332
   16115106 |     350047 |    2902075
   16116260 |     290583 |    2955483
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16116249 |     296708 |    3038888
   16114811 |     298605 |    2998909
   16116241 |     295971 |    3038921
(26 rows)

Here's an example with only 21 rows:

 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16116256 |     293541 |    2997160
   16116069 |     351585 |    3025941
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16115106 |     350047 |    2902075
   16116241 |     295971 |    3038921
   16116260 |     290583 |    2955483
   16116249 |     296708 |    3038888
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
(21 rows)

On Tue, Jan 8, 2019 at 12:39 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 8, 2019 at 12:04 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>> As a sanity check, can you please see if the run-only-in-leader case
>> (max_parallel_workers = 0, so that we see "Workers Launched: 0")
>> produces the *same* 31 rows as the run-only-in-worker case
>> (force_parallel_mode = on, so that we see "Single Copy: true")?  That
>> is, the actual values of those 31 rows, in particular the columns
>> coming from table_b.
>
> Rows returned by two queries are the same.

Can we please also see the actual output in the broken case, where it
runs in two processes and produces fewer than 31 rows?

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15577: Query returns different results when executedmultiple times

От
David Rowley
Дата:
On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> (26 rows)
>
> Here's an example with only 21 rows:

Interestingly between those two results, all your 31 distinct rows
exist. I threw them into a table and did:

# select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
    a     |   b    |    c    |    a     |   b    |    c
----------+--------+---------+----------+--------+---------
 16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
 16114811 | 298605 | 2893188 |          |        |
 16114811 | 298605 | 2893391 |          |        |
 16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
 16114811 | 298605 | 2983360 |          |        |
 16114811 | 298605 | 2987038 |          |        |
 16114811 | 298605 | 2998909 |          |        |
 16114811 | 298605 | 2998966 |          |        |
 16114811 | 298605 | 3010694 |          |        |
 16114811 | 298605 | 3026078 |          |        |
 16114811 | 298605 | 3038221 |          |        |
 16114811 | 298605 | 3038877 |          |        |
 16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
 16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
 16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
 16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
 16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
 16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
 16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
 16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
 16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
 16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
 16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
 16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
          |        |         | 16116256 | 293541 | 2901933
 16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
          |        |         | 16116256 | 293541 | 2997160
          |        |         | 16116260 | 290583 | 2901669
          |        |         | 16116260 | 290583 | 2921135
          |        |         | 16116260 | 290583 | 2947914
 16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
(31 rows)

IOW, it does not seem like there's a particlar pattern to what rows are missing.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Tue, Jan 8, 2019 at 1:31 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> > (26 rows)
> >
> > Here's an example with only 21 rows:
>
> Interestingly between those two results, all your 31 distinct rows
> exist. I threw them into a table and did:
>
> # select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
>     a     |   b    |    c    |    a     |   b    |    c
> ----------+--------+---------+----------+--------+---------
>  16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
>  16114811 | 298605 | 2893188 |          |        |
>  16114811 | 298605 | 2893391 |          |        |
>  16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
>  16114811 | 298605 | 2983360 |          |        |
>  16114811 | 298605 | 2987038 |          |        |
>  16114811 | 298605 | 2998909 |          |        |
>  16114811 | 298605 | 2998966 |          |        |
>  16114811 | 298605 | 3010694 |          |        |
>  16114811 | 298605 | 3026078 |          |        |
>  16114811 | 298605 | 3038221 |          |        |
>  16114811 | 298605 | 3038877 |          |        |
>  16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
>  16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
>  16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
>  16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
>  16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
>  16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
>  16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
>  16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
>  16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
>  16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
>  16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
>  16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
>           |        |         | 16116256 | 293541 | 2901933
>  16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
>           |        |         | 16116256 | 293541 | 2997160
>           |        |         | 16116260 | 290583 | 2901669
>           |        |         | 16116260 | 290583 | 2921135
>           |        |         | 16116260 | 290583 | 2947914
>  16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
> (31 rows)
>
> IOW, it does not seem like there's a particlar pattern to what rows are missing.

Is the pattern something like this?  When the index probe of b should
find multiple matches, it's returning only one and then we move onto
the next outer row?  For example we see this for table_c_id = 298605,
of which we see 11 examples in the 31-row output (with different
table_b_id values), but only one in the 21-row output, with the first
table_b_id value.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Tue, Jan 8, 2019 at 1:40 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Jan 8, 2019 at 1:31 PM David Rowley
> <david.rowley@2ndquadrant.com> wrote:
> > On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> > > (26 rows)
> > >
> > > Here's an example with only 21 rows:
> >
> > Interestingly between those two results, all your 31 distinct rows
> > exist. I threw them into a table and did:
> >
> > # select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
> >     a     |   b    |    c    |    a     |   b    |    c
> > ----------+--------+---------+----------+--------+---------
> >  16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
> >  16114811 | 298605 | 2893188 |          |        |
> >  16114811 | 298605 | 2893391 |          |        |
> >  16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
> >  16114811 | 298605 | 2983360 |          |        |
> >  16114811 | 298605 | 2987038 |          |        |
> >  16114811 | 298605 | 2998909 |          |        |
> >  16114811 | 298605 | 2998966 |          |        |
> >  16114811 | 298605 | 3010694 |          |        |
> >  16114811 | 298605 | 3026078 |          |        |
> >  16114811 | 298605 | 3038221 |          |        |
> >  16114811 | 298605 | 3038877 |          |        |
> >  16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
> >  16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
> >  16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
> >  16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
> >  16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
> >  16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
> >  16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
> >  16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
> >  16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
> >  16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
> >  16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
> >  16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
> >           |        |         | 16116256 | 293541 | 2901933
> >  16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
> >           |        |         | 16116256 | 293541 | 2997160
> >           |        |         | 16116260 | 290583 | 2901669
> >           |        |         | 16116260 | 290583 | 2921135
> >           |        |         | 16116260 | 290583 | 2947914
> >  16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
> > (31 rows)
> >
> > IOW, it does not seem like there's a particlar pattern to what rows are missing.
>
> Is the pattern something like this?  When the index probe of b should
> find multiple matches, it's returning only one and then we move onto
> the next outer row?  For example we see this for table_c_id = 298605,
> of which we see 11 examples in the 31-row output (with different
> table_b_id values), but only one in the 21-row output, with the first
> table_b_id value.

It's as if it thought it was executing a semi-join.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
David Rowley
Дата:
On Tue, 8 Jan 2019 at 13:43, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> It's as if it thought it was executing a semi-join.

EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
would act as a semi-join.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Tue, Jan 8, 2019 at 1:48 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On Tue, 8 Jan 2019 at 13:43, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> > It's as if it thought it was executing a semi-join.
>
> EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
> would act as a semi-join.

Hmm, yes, node->js.single_match == true in one (but not both!) process
would fit these results.  But I don't see a mechanism for that.  I
*guess* it's the worker that is skipping duplicates, because the
leader usually has time to emit a few tuples while the worker's
warming up and we see some duplicates (348539) in first few results
(at least it usually does on my machines for eager plans, though maybe
on Windows it's different?)

Bartosz, can we please try with force_parallel_mode = off, but also
parallel_leader_participation = off?  (I meant to write that in an
earlier email but accidentally wrote "on".  Gah.)  Then we should get
the same plan (the forced parallel plan is a bit different as you
noted; maybe whatever is broken isn't triggered that way).  You might
need to set parallel_workers to 2 on the table for it to pick a
parallel plan without leader participation.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Bartosz Polnik
Дата:
Sure. 

SET force_parallel_mode = off;
SET parallel_leader_participation = off;

alter table_c set (parallel_workers=2);
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date = '2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

Output:
                                                                                                              QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=203.910..346.758 rows=31 loops=1)
   ->  Gather (actual time=203.802..373.228 rows=31 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop (actual time=157.358..253.888 rows=16 loops=2)
               ->  Hash Join (actual time=0.365..15.749 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.113..15.043 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.078..0.078 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.058..0.065 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.248..0.252 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.030..0.030 rows=1 loops=31)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.923 ms
 Execution Time: 374.408 ms

31 rows - correct result.


On Tue, Jan 8, 2019 at 2:30 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 8, 2019 at 1:48 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On Tue, 8 Jan 2019 at 13:43, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> > It's as if it thought it was executing a semi-join.
>
> EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
> would act as a semi-join.

Hmm, yes, node->js.single_match == true in one (but not both!) process
would fit these results.  But I don't see a mechanism for that.  I
*guess* it's the worker that is skipping duplicates, because the
leader usually has time to emit a few tuples while the worker's
warming up and we see some duplicates (348539) in first few results
(at least it usually does on my machines for eager plans, though maybe
on Windows it's different?)

Bartosz, can we please try with force_parallel_mode = off, but also
parallel_leader_participation = off?  (I meant to write that in an
earlier email but accidentally wrote "on".  Gah.)  Then we should get
the same plan (the forced parallel plan is a bit different as you
noted; maybe whatever is broken isn't triggered that way).  You might
need to set parallel_workers to 2 on the table for it to pick a
parallel plan without leader participation.

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Tue, Jan 8, 2019 at 10:45 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> SET force_parallel_mode = off;
> SET parallel_leader_participation = off;
>
> alter table_c set (parallel_workers=2);
> ...
>
> 31 rows - correct result.

Huh.  So now we have:

1.  Leader only (max_parallel_workers = 0): OK
2.  Worker only (parallel_leader_participation = off, parallel_workers = 2): OK
3.  Leader + worker: broken, short results
4.  Worker only with slightly different plan (force_parallel_mode = on): OK

Ahh, I just reproduced this.  Take mockup_bug15577.sql (Bartosz's
version) and then do this:

insert into table_c select * from table_c;

insert into table_b
select generate_series(1000000, 1000010), 100112, 'A', 99888,
'2018-09-01', true,
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

Now the number of results from the query in the script is not stable,
varying between 235 or 246 results for me.  Not sure what's going on
yet.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Now the number of results from the query in the script is not stable,
> varying between 235 or 246 results for me.  Not sure what's going on
> yet.

When there are multiple matches found by the Index Scan, sometimes
only the first one makes it through to the Nested Loop, in the leader.
It's not because of single_match being inappropriately set as I
speculated earlier (though that'd behave similarly I guess) but
because the quals are somehow broken in between invocations so that
ExecScan() filters the rest of the matches out, and then fixed again
on the next rescan.  This happens whenever the leader takes a break to
read tuple from parallel workers in between executing the Nested Loop
node, which it often does after reading the first of several tuples
resulting from a single rescan.

What exactly is being corrupted and how, I don't yet know, and I need
to leave this here for today, but that's what I've got so far.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Bartosz Polnik
Дата:
Thomas, I tried to reproduce this case with changes you listed, but failed to do that - the number of results was the same on each run. Could you post an updated script in an attachment?

On Wed, Jan 9, 2019 at 5:01 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Now the number of results from the query in the script is not stable,
> varying between 235 or 246 results for me.  Not sure what's going on
> yet.

When there are multiple matches found by the Index Scan, sometimes
only the first one makes it through to the Nested Loop, in the leader.
It's not because of single_match being inappropriately set as I
speculated earlier (though that'd behave similarly I guess) but
because the quals are somehow broken in between invocations so that
ExecScan() filters the rest of the matches out, and then fixed again
on the next rescan.  This happens whenever the leader takes a break to
read tuple from parallel workers in between executing the Nested Loop
node, which it often does after reading the first of several tuples
resulting from a single rescan.

What exactly is being corrupted and how, I don't yet know, and I need
to leave this here for today, but that's what I've got so far.

--
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15577: Query returns different results when executedmultiple times

От
David Rowley
Дата:
On Wed, 9 Jan 2019 at 17:01, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > Now the number of results from the query in the script is not stable,
> > varying between 235 or 246 results for me.  Not sure what's going on
> > yet.
>
> When there are multiple matches found by the Index Scan, sometimes
> only the first one makes it through to the Nested Loop, in the leader.
> It's not because of single_match being inappropriately set as I
> speculated earlier (though that'd behave similarly I guess) but
> because the quals are somehow broken in between invocations so that
> ExecScan() filters the rest of the matches out, and then fixed again
> on the next rescan.  This happens whenever the leader takes a break to
> read tuple from parallel workers in between executing the Nested Loop
> node, which it often does after reading the first of several tuples
> resulting from a single rescan.
>
> What exactly is being corrupted and how, I don't yet know, and I need
> to leave this here for today, but that's what I've got so far.

From what I can see it seems to be the parameter for the index scan
that gets the wrong value.

I changed int4eq to become:

Datum
int4eq(PG_FUNCTION_ARGS)
{
    int32 arg1 = PG_GETARG_INT32(0);
    int32 arg2 = PG_GETARG_INT32(1);

    if (arg1 == 100112 || arg2 == 100112)
        elog(NOTICE, "%d %d", arg1, arg2);
    PG_RETURN_BOOL(arg1 == arg2);
}

When the query returns the correct number of rows I only see NOTICES with:

NOTICE:  100112 100112

but when the wrong results are returned I see a few such as:

NOTICE:  100112 57772

If I put a conditional breakpoint on the elog with the condition "arg2
!= 100112" I see that the arg2 value is coming from the nested loop's
parameter.

If I modify the query to remove the redundant tba.target_id = tc.id
from the WHERE clause, then I can't trigger the issue.  i.e.

    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

becomes:

    AND (
      tba.group_type = 'A'
      OR tba.source_id = tc.id
    );

So the parameter does not need to have its value checked again during
the qual check. (If Bartosz is looking for a quick fix then he should
be able to alter the query to that)

The index scan should have already ensured that tba.target_id = tc.id.
So the nested loop parameter must have had the correct value at the
start of the index scan, but somehow it must have changed sometime
after the first tuple was returned from the index scan and before we
asked the index for the next tuple (hence why it looked like a
semi-join's result) I've just not managed to find out how the
parameter value gets changed. The value does not seem to have been
stomped on as each time I see the incorrect parameter value, it's set
to something else that occurs in the table.

At the moment I'm a bit suspicious of the following code in ParallelQueryMain().

paramexec_space = dsa_get_address(area, fpes->param_exec);
RestoreParamExecParams(paramexec_space, queryDesc->estate);

On first look, it seems that the exec params are in shared memory? ...
but I need more time to figure out how this all fits together.

I've attached the updated script per Thomas' alterations to make it break.

The breakage is very old. I found it existed when parallel joins were
added, but not necessarily blaming that commit.

45be99f8cd Support parallel joins, and make related improvements. -- BAD
a7de3dc5c3 Support multi-stage aggregation. -- GOOD

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Thu, Jan 10, 2019 at 5:35 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On Wed, 9 Jan 2019 at 17:01, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> > What exactly is being corrupted and how, I don't yet know, and I need
> > to leave this here for today, but that's what I've got so far.
>
> From what I can see it seems to be the parameter for the index scan
> that gets the wrong value.
>
> [explanation about redundant qual tba.source_id = tc.id and corrupted parameter]

Oh, yeah, that'd do it.  Here are some excerpts from my printf
debugging, all from the leader process:

[51461] nestloop plan id 2 setting paramno 0 to value 100112
[51461] IndexScan returning a tuple (att1 = 1000010, att2 = 100112)
[51461] --- NestLoop got inner att2 = 100112 <--- FIRST TUPLE MADE IT OUT
[51461] nestloop plan id 0 setting paramno 0 to value 100112
[51461] gather will read tuple from queue
[51461] nestloop plan id 0 setting paramno 0 to value 98662 <--- OOPS
[51461] IndexScan returning a tuple (att1 = 1000009, att2 = 100112)
[51461] dropping a tuple (att1 = 1000009, att2 = 100112) <--- NEXT ONE DIDN'T
[51461] IndexScan returning a tuple (att1 = 1000008, att2 = 100112)
[51461] dropping a tuple (att1 = 1000008, att2 = 100112) <--- DITTO
...

So it's not the reading of the tuple queue per se, but the resulting
visit to a higher part of the plan when the Gather node emits a tuple.
It can't be right that plan id 2 and plan id 0 are both using paramno
0, can it?  I'm not too familiar with the planner code that allocates
those but will go and poke at it.



--
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Wed, Jan 9, 2019 at 8:48 PM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> Thomas, I tried to reproduce this case with changes you listed, but failed to do that - the number of results was the
sameon each run. Could you post an updated script in an attachment?
 

Probably due to timing of parallel worker startup.  Different
computers, operating systems, alignment of the stars etc.  Debugging
parallel query problems is fun like that.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executed multiple times

От
Andrew Gierth
Дата:
>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:

 Thomas> So it's not the reading of the tuple queue per se, but the
 Thomas> resulting visit to a higher part of the plan when the Gather
 Thomas> node emits a tuple. It can't be right that plan id 2 and plan
 Thomas> id 0 are both using paramno 0, can it? I'm not too familiar
 Thomas> with the planner code that allocates those but will go and poke
 Thomas> at it.

Looks like this isn't a new bug, either; I can reproduce it on 10.x.

So I think you're right to finger the planner's allocation of params as
the culprit. Looking at replace_nestloop_params_mutator, it's keeping a
list of nestloop params in root->curOuterParams (thus, globally to the
whole subquery being planned), and it assumes that if equal() is true
between the value of an existing nestloop param (which must be a Var)
and the Var for the one it's trying to create, then it's ok to use the
existing one.

But clearly this can't work if one param is referenced both inside and
outside a Gather, because while they will compare equal for Vars, they
won't actually have the same value thanks to rows coming in from
workers.

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Thu, Jan 10, 2019 at 10:04 AM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> So I think you're right to finger the planner's allocation of params as
> the culprit. Looking at replace_nestloop_params_mutator, it's keeping a
> list of nestloop params in root->curOuterParams (thus, globally to the
> whole subquery being planned), and it assumes that if equal() is true
> between the value of an existing nestloop param (which must be a Var)
> and the Var for the one it's trying to create, then it's ok to use the
> existing one.
>
> But clearly this can't work if one param is referenced both inside and
> outside a Gather, because while they will compare equal for Vars, they
> won't actually have the same value thanks to rows coming in from
> workers.

But if they used different params, there could be different problems,
no?  It's logically the same var.  This makes me wonder if we need
some kind of scheme for saving and restoring affected params whenever
Gather switches between executing the plan directly and emitting
tuples from workers, or something like that...

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executed multiple times

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> So it's not the reading of the tuple queue per se, but the resulting
> visit to a higher part of the plan when the Gather node emits a tuple.
> It can't be right that plan id 2 and plan id 0 are both using paramno
> 0, can it?

So the plan shape, shorn of uninteresting detail, is

 Nested Loop Left Join
   ->  Gather
         ->  Nested Loop
               ->  Hash Join
                     ->  Parallel Seq Scan on table_c tc
                     ->  Hash
                           ->  Seq Scan on table_d td
               ->  Index Scan ... on table_b tba
                     Index Cond: (target_id = tc.id)
   ->  Index Scan ... on table_a ta
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

The lower Nested Loop has a NestLoopParam node telling it to pass the
value of tc.id down to the tba indexscan.  The upper one also has a
NestLoopParam node, telling it to pass the value of tc.id down to the ta
indexscan.  Since these are for the same Var, the planner believes it's
safe to use the same PARAM_EXEC slot for both of them.  And in a normal
plan tree, it'd be right.  I think what is happening is

(1) Leader runs the lower nestloop for a bit and gets back a successful
tc/tba join row.  At this point its PARAM_EXEC slot is filled with the
tc.id value from that tc row.

(2) It hands this up to the upper nestloop, which assigns that same
tc.id to the PARAM_EXEC slot ... no trouble so far ... and runs the
ta indexscan with that.

(3) Now we go back to the Gather, which I think must go off and look for
rows coming from child workers before it resumes the leader's nestloop.
It gets some row, with a different tc.id value, and hands that back up.

(4) Upper nestloop assigns the tc.id value from the child's row to the
PARAM_EXEC slot, and runs the ta indexscan with that.

(5) Return to the Gather, which now allows the lower nestloop to proceed
in the leader.

(6) Lower nestloop thinks it's not done with the current outer (tc) row,
so it invokes the tba indexscan to see if that can return any more
rows.  But since it hasn't advanced to a new outer row, it doesn't
touch the NestLoopParam or PARAM_EXEC slot.

(7) At this point we're resuming the tba indexscan with a different
value for tc.id than it was started with.  The results of that are
undefined, IMO, but what's apparently happening is that it fails to
match some rows that it otherwise would have matched.  So the end
result is some rows are missing from the output.


One way we could deal with this, perhaps, is to decide that NestLoopParams
above and below a Gather can't share PARAM_EXEC slots.  I'm not sure how
we'd mechanize that, other than not allowing NestLoopParams to share
PARAM_EXEC slots *ever*, which might not really be that much of a cost.

But TBH the whole thing frightens me quite a lot as to what other
serial-processing assumptions are getting broken by plopping Gather
into the middle of a plan tree.  I wonder whether we'd not be best
off disallowing this sort of plan shape, and/or forbidding the leader
from also executing the parallelized subplan.

            regards, tom lane


Re: BUG #15577: Query returns different results when executed multiple times

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Thu, Jan 10, 2019 at 10:04 AM Andrew Gierth
> <andrew@tao11.riddles.org.uk> wrote:
>> But clearly this can't work if one param is referenced both inside and
>> outside a Gather, because while they will compare equal for Vars, they
>> won't actually have the same value thanks to rows coming in from
>> workers.

> But if they used different params, there could be different problems,
> no?  It's logically the same var.

As far as I can think at the moment, there's no problem with having
multiple nestloop Params referencing the "same" Var.  It could be an
impediment to optimization if it happened (much) earlier in the planner,
but for the situation at hand the only code that's going to be looking
at the tree is the executor and maybe ruleutils, both of which are much
too stupid to be bothered by such aliasing.

> This makes me wonder if we need
> some kind of scheme for saving and restoring affected params whenever
> Gather switches between executing the plan directly and emitting
> tuples from workers, or something like that...

I don't think we need to (or should) go there if this is the only
problem.  What's worrying me is what other assumptions based on serial
plan execution are getting broken by injecting Gather into mid levels
of a plan tree.

            regards, tom lane


Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Thu, Jan 10, 2019 at 12:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > On Thu, Jan 10, 2019 at 10:04 AM Andrew Gierth
> > <andrew@tao11.riddles.org.uk> wrote:
> >> But clearly this can't work if one param is referenced both inside and
> >> outside a Gather, because while they will compare equal for Vars, they
> >> won't actually have the same value thanks to rows coming in from
> >> workers.
>
> > But if they used different params, there could be different problems,
> > no?  It's logically the same var.
>
> As far as I can think at the moment, there's no problem with having
> multiple nestloop Params referencing the "same" Var.  It could be an
> impediment to optimization if it happened (much) earlier in the planner,
> but for the situation at hand the only code that's going to be looking
> at the tree is the executor and maybe ruleutils, both of which are much
> too stupid to be bothered by such aliasing.

The index scan does actually emit all the tuples it should in your
paragraph "(7)" (maybe why nobody ever noticed this problem before)
but in this case there is also an extra (redundant?) qual referencing
the param, so ExecScan()'s call to ExecQual() returns false after the
other Nest Loop tramples on it, and the tuples are filtered out (I
showed that as "dropped" in my printf-debugging excerpt up-thread).
We'd have to make sure that the qual references the param that is set
by this join and not its evil twin.  I'm confused about how that and
any other references to the Var would work, but as you can probably
tell I don't have a great grip on the Var/param system and the
relevant optimisation phases yet.

Hmm.  Why are those ExecQual() -> false cases not showing up as
variation in the "Rows Removed by Filter" counter visible in EXPLAIN
ANALYZE?  Then we might have arrived here a lot faster.
InstrCountFiltered1(node, 1) is executed, but somehow the count
doesn't make it into the total shown by EXPLAIN.

> > This makes me wonder if we need
> > some kind of scheme for saving and restoring affected params whenever
> > Gather switches between executing the plan directly and emitting
> > tuples from workers, or something like that...
>
> I don't think we need to (or should) go there if this is the only
> problem.  What's worrying me is what other assumptions based on serial
> plan execution are getting broken by injecting Gather into mid levels
> of a plan tree.

parallel_leader_participation = on is a many-headed serpentine beast.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executedmultiple times

От
David Rowley
Дата:
On Thu, 10 Jan 2019 at 10:04, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> Looks like this isn't a new bug, either; I can reproduce it on 10.x.

Yeah, introduced in 9.6;

I wrote:
> 45be99f8cd Support parallel joins, and make related improvements. -- BAD
> a7de3dc5c3 Support multi-stage aggregation. -- GOOD

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BUG #15577: Query returns different results when executedmultiple times

От
David Rowley
Дата:
On Thu, 10 Jan 2019 at 12:28, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hmm.  Why are those ExecQual() -> false cases not showing up as
> variation in the "Rows Removed by Filter" counter visible in EXPLAIN
> ANALYZE?  Then we might have arrived here a lot faster.
> InstrCountFiltered1(node, 1) is executed, but somehow the count
> doesn't make it into the total shown by EXPLAIN.

Good question.  They are tracked by the instrumentation, it's just
that show_instrumentation_count() divides by nloops so the additional
filtered rows are lost to the rounding.  Removing the / nloops shows
you what you want to see.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BUG #15577: Query returns different results when executedmultiple times

От
Robert Haas
Дата:
On Wed, Jan 9, 2019 at 5:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (7) At this point we're resuming the tba indexscan with a different
> value for tc.id than it was started with.  The results of that are
> undefined, IMO, but what's apparently happening is that it fails to
> match some rows that it otherwise would have matched.  So the end
> result is some rows are missing from the output.

Oops.

> One way we could deal with this, perhaps, is to decide that NestLoopParams
> above and below a Gather can't share PARAM_EXEC slots.  I'm not sure how
> we'd mechanize that, other than not allowing NestLoopParams to share
> PARAM_EXEC slots *ever*, which might not really be that much of a cost.
>
> But TBH the whole thing frightens me quite a lot as to what other
> serial-processing assumptions are getting broken by plopping Gather
> into the middle of a plan tree.  I wonder whether we'd not be best
> off disallowing this sort of plan shape, and/or forbidding the leader
> from also executing the parallelized subplan.

FWIW, I would be inclined to blame this on me not really understanding
the Param machinery very well rather than any more general brand of
insanity.  I wouldn't be a bit shocked if this is not the last bit
Param-related dumbness than parallel query has.  I did *try* to get it
right, but I find that stuff to be pretty arcane and not entirely
well-documented.

As to disallow "this plan shape," both leader participation and the
ability to generate this kind of plan are there because they improve
performance substantially on some queries.  I put a heck of a lot of
energy into being able to parallelize only part of a query, and into
allowing for leader participation, and I believe those things make a
real difference.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: BUG #15577: Query returns different results when executedmultiple times

От
David Rowley
Дата:
On Thu, 10 Jan 2019 at 15:06, Robert Haas <robertmhaas@gmail.com> wrote:
> As to disallow "this plan shape," both leader participation and the
> ability to generate this kind of plan are there because they improve
> performance substantially on some queries.  I put a heck of a lot of
> energy into being able to parallelize only part of a query, and into
> allowing for leader participation, and I believe those things make a
> real difference.

I 100% agree that this shape of plan is useful.  I guess though, that
whether we keep allowing it or not depends on if a suitable fix
appears for the issue.

One idea that I've been thinking about is some sort of param context
number that we bump during createplan when we encounter a
Gather/GatherMerge path and have assign_param_for_var() pay attention
to that number when deciding if sharing a parameter is okay. How
exactly that context number is calculated and how we pass the value
around I've currently no idea, so please consider this very handwavy.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BUG #15577: Query returns different results when executed multiple times

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jan 9, 2019 at 5:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One way we could deal with this, perhaps, is to decide that NestLoopParams
>> above and below a Gather can't share PARAM_EXEC slots.  I'm not sure how
>> we'd mechanize that, other than not allowing NestLoopParams to share
>> PARAM_EXEC slots *ever*, which might not really be that much of a cost.
>> 
>> But TBH the whole thing frightens me quite a lot as to what other
>> serial-processing assumptions are getting broken by plopping Gather
>> into the middle of a plan tree.  I wonder whether we'd not be best
>> off disallowing this sort of plan shape, and/or forbidding the leader
>> from also executing the parallelized subplan.

> FWIW, I would be inclined to blame this on me not really understanding
> the Param machinery very well rather than any more general brand of
> insanity.  I wouldn't be a bit shocked if this is not the last bit
> Param-related dumbness than parallel query has.  I did *try* to get it
> right, but I find that stuff to be pretty arcane and not entirely
> well-documented.

Well, if we posit that the problems are limited to false Param-sharing,
my inclination is to fix it by simplifying rather than adding complexity.
To wit, let's just assign a new PARAM_EXEC slot for every NestLoopParam
and never try to share them.  The argument that sharing them would work
for NLPs representing the same Var hinged on the assumption that nested
NestLoops would necessarily execute "in sync", which we now see is false
at least when Gathers are in between.

This'll cost some more PARAM_EXEC slots, of course, but AFAICS the
cost of that is trivial, just a few more bytes per slot.  It's arguable
that the cost of trying to de-duplicate them at plan time is more than
we can save at execution :-(

I'm about fried for tonight, but I can take a look at this tomorrow.

            regards, tom lane


Re: BUG #15577: Query returns different results when executed multiple times

От
Tom Lane
Дата:
I wrote:
> Well, if we posit that the problems are limited to false Param-sharing,
> my inclination is to fix it by simplifying rather than adding complexity.
> To wit, let's just assign a new PARAM_EXEC slot for every NestLoopParam
> and never try to share them.  The argument that sharing them would work
> for NLPs representing the same Var hinged on the assumption that nested
> NestLoops would necessarily execute "in sync", which we now see is false
> at least when Gathers are in between.

Here's a POC patch for that.  It's seriously ugly, but it passes
check-world and I can no longer reproduce the problem using
mockup_bug15577_v2.sql.

At this point assign_nestloop_param_var and
assign_nestloop_param_placeholdervar are dead code, and there's a bunch
of incorrect comments in subselect.c, and I really need to refactor
the division of labor between createplan.c and subselect.c (for one
thing, this is an abuse of the documented purpose of
SS_make_initplan_output_param).  But functionally I think it does the
right thing.  Please test and verify that you no longer see the race
condition.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 066685c..6ce5938 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*************** replace_nestloop_params_mutator(Node *no
*** 4360,4379 ****
          /* If not to be replaced, we can just return the Var unmodified */
          if (!bms_is_member(var->varno, root->curOuterRels))
              return node;
!         /* Create a Param representing the Var */
!         param = assign_nestloop_param_var(root, var);
!         /* Is this param already listed in root->curOuterParams? */
          foreach(lc, root->curOuterParams)
          {
              nlp = (NestLoopParam *) lfirst(lc);
!             if (nlp->paramno == param->paramid)
              {
!                 Assert(equal(var, nlp->paramval));
!                 /* Present, so we can just return the Param */
                  return (Node *) param;
              }
          }
!         /* No, so add it */
          nlp = makeNode(NestLoopParam);
          nlp->paramno = param->paramid;
          nlp->paramval = var;
--- 4360,4389 ----
          /* If not to be replaced, we can just return the Var unmodified */
          if (!bms_is_member(var->varno, root->curOuterRels))
              return node;
!         /* Is this Var already listed in root->curOuterParams? */
          foreach(lc, root->curOuterParams)
          {
              nlp = (NestLoopParam *) lfirst(lc);
!             if (equal(var, nlp->paramval))
              {
!                 /* Yes, so just make a Param referencing this NLP's slot */
!                 param = makeNode(Param);
!                 param->paramkind = PARAM_EXEC;
!                 param->paramid = nlp->paramno;
!                 param->paramtype = var->vartype;
!                 param->paramtypmod = var->vartypmod;
!                 param->paramcollid = var->varcollid;
!                 param->location = var->location;
                  return (Node *) param;
              }
          }
!         /* No, so assign a PARAM_EXEC slot for a new NLP */
!         param = SS_make_initplan_output_param(root,
!                                               var->vartype,
!                                               var->vartypmod,
!                                               var->varcollid);
!         param->location = var->location;
!         /* Add it to the list of NLPs */
          nlp = makeNode(NestLoopParam);
          nlp->paramno = param->paramid;
          nlp->paramval = var;
*************** replace_nestloop_params_mutator(Node *no
*** 4423,4442 ****
                                                  root);
              return (Node *) newphv;
          }
!         /* Create a Param representing the PlaceHolderVar */
!         param = assign_nestloop_param_placeholdervar(root, phv);
!         /* Is this param already listed in root->curOuterParams? */
          foreach(lc, root->curOuterParams)
          {
              nlp = (NestLoopParam *) lfirst(lc);
!             if (nlp->paramno == param->paramid)
              {
!                 Assert(equal(phv, nlp->paramval));
!                 /* Present, so we can just return the Param */
                  return (Node *) param;
              }
          }
!         /* No, so add it */
          nlp = makeNode(NestLoopParam);
          nlp->paramno = param->paramid;
          nlp->paramval = (Var *) phv;
--- 4433,4461 ----
                                                  root);
              return (Node *) newphv;
          }
!         /* Is this PHV already listed in root->curOuterParams? */
          foreach(lc, root->curOuterParams)
          {
              nlp = (NestLoopParam *) lfirst(lc);
!             if (equal(phv, nlp->paramval))
              {
!                 /* Yes, so just make a Param referencing this NLP's slot */
!                 param = makeNode(Param);
!                 param->paramkind = PARAM_EXEC;
!                 param->paramid = nlp->paramno;
!                 param->paramtype = exprType((Node *) phv->phexpr);
!                 param->paramtypmod = exprTypmod((Node *) phv->phexpr);
!                 param->paramcollid = exprCollation((Node *) phv->phexpr);
!                 param->location = -1;
                  return (Node *) param;
              }
          }
!         /* No, so assign a PARAM_EXEC slot for a new NLP */
!         param = SS_make_initplan_output_param(root,
!                                               exprType((Node *) phv->phexpr),
!                                               exprTypmod((Node *) phv->phexpr),
!                                               exprCollation((Node *) phv->phexpr));
!         /* Add it to the list of NLPs */
          nlp = makeNode(NestLoopParam);
          nlp->paramno = param->paramid;
          nlp->paramval = (Var *) phv;

Re: BUG #15577: Query returns different results when executedmultiple times

От
Thomas Munro
Дата:
On Fri, Jan 11, 2019 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Well, if we posit that the problems are limited to false Param-sharing,
> > my inclination is to fix it by simplifying rather than adding complexity.
> > To wit, let's just assign a new PARAM_EXEC slot for every NestLoopParam
> > and never try to share them.  The argument that sharing them would work
> > for NLPs representing the same Var hinged on the assumption that nested
> > NestLoops would necessarily execute "in sync", which we now see is false
> > at least when Gathers are in between.
>
> Here's a POC patch for that.  It's seriously ugly, but it passes
> check-world and I can no longer reproduce the problem using
> mockup_bug15577_v2.sql.
>
> At this point assign_nestloop_param_var and
> assign_nestloop_param_placeholdervar are dead code, and there's a bunch
> of incorrect comments in subselect.c, and I really need to refactor
> the division of labor between createplan.c and subselect.c (for one
> thing, this is an abuse of the documented purpose of
> SS_make_initplan_output_param).  But functionally I think it does the
> right thing.  Please test and verify that you no longer see the race
> condition.

I no longer see it here.

I can also see that Nested Loop with plan ID 0 is using paramno 2, and
the Nested Loop with plan ID 2 is using paramno 0.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15577: Query returns different results when executed multiple times

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Fri, Jan 11, 2019 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> At this point assign_nestloop_param_var and
>> assign_nestloop_param_placeholdervar are dead code, and there's a bunch
>> of incorrect comments in subselect.c, and I really need to refactor
>> the division of labor between createplan.c and subselect.c (for one
>> thing, this is an abuse of the documented purpose of
>> SS_make_initplan_output_param).  But functionally I think it does the
>> right thing.  Please test and verify that you no longer see the race
>> condition.

> I no longer see it here.

Thanks for checking!

I'm having some difficulty choosing what to do refactoring-wise.
There are a couple of conflicting considerations:

* Currently, subselect.c is in charge of assigning PARAM_EXEC slots;
in particular, nothing else touches root->glob->paramExecTypes.
I'm kind of loath to give that up.

* On the other hand, root->curOuterParams is currently only manipulated
by createplan.c, and if we could keep that as a local data structure,
that'd be nice too.

* However, if we stick to both of those constraints then we're forced
into more or less what the POC patch does.  We could provide another
subselect.c function, say SS_make_nestloop_param, which'd just wrap
generate_new_param the same as SS_make_initplan_output_param, but
we still have a pretty weird division of labor IMO.

The fundamental issue here is that it's now going to be the state of the
curOuterParams list that determines whether a new PARAM_EXEC slot is
needed.  Really that list serves the same sort of purpose as the
root->plan_params list, but its entries have very different lifespans than
the entries in plan_params.  So there's a reasonable case to be made that
we should put management of curOuterParams into subselect.c, except that
(a) it's a bit far afield from sub-selects, and (b) I'm not sure how
completely it could be decoupled from createplan.c.

If this were a HEAD-only patch I'd be tempted to do like Alvaro just did
and move all the PARAM_EXEC assignment logic and root->plan_params
and root->curOuterParams manipulation into a new file, say
optimizer/util/paramassign.c.  But that would be a little invasive
for a back-patch.

Thoughts?

            regards, tom lane


Re: BUG #15577: Query returns different results when executed multiple times

От
Tom Lane
Дата:
I wrote:
> I'm having some difficulty choosing what to do refactoring-wise.
> ...
> If this were a HEAD-only patch I'd be tempted to do like Alvaro just did
> and move all the PARAM_EXEC assignment logic and root->plan_params
> and root->curOuterParams manipulation into a new file, say
> optimizer/util/paramassign.c.  But that would be a little invasive
> for a back-patch.

Actually, maybe it wouldn't be that bad.  The functions I'd want to
move into a new file are currently mostly static in subselect.c.
Moving them elsewhere (and consequently making them not-static)
would therefore not break any existing API.  The exceptions are:

SS_assign_special_param

We could keep a wrapper by this name in the back branches, and thereby
avoid any API/ABI break for extension code using it (which it seems
somewhat likely that there might be).

SS_make_initplan_output_param

Although this'd just be a one-line wrapper for functionality exported
by the hypothetical new file, it's somewhat closely related to
SS_make_initplan_from_plan.  So keeping it, with its current name,
isn't outlandish.

assign_nestloop_param_var
assign_nestloop_param_placeholdervar

These are going to go away, or else change API substantially, in any
case.  Hopefully there's no extension code using them.

So what I'm now contemplating is moving these existing functions
to a new file:

int assign_param_for_var(PlannerInfo *root, Var *var)
Param *replace_outer_var(PlannerInfo *root, Var *var)
int assign_param_for_placeholdervar(PlannerInfo *root, PlaceHolderVar *phv)
Param *replace_outer_placeholdervar(PlannerInfo *root, PlaceHolderVar *phv)
Param *replace_outer_agg(PlannerInfo *root, Aggref *agg)
Param *replace_outer_grouping(PlannerInfo *root, GroupingFunc *grp)
Param *generate_new_param(PlannerInfo *root, Oid paramtype, int32 paramtypmod,
                          Oid paramcollation)
int assign_special_param(PlannerInfo *root)
    (rename of SS_assign_special_param)

along with some new functions for nestloop parameters (extracted from
createplan.c logic).

assign_param_for_var and assign_param_for_placeholdervar could remain
"static" since they aren't called from anywhere else.  (Actually I
guess they'd end up getting merged back into the replace_outer_
functions, since they'll now have just one caller apiece.)

A variant idea would involve also moving replace_correlation_vars_mutator,
which would allow the individual replace_outer_xxx functions to remain
static.  A different idea is to just move the assign_param_for_xxx
functionality, which'd require breaking replace_outer_agg and
replace_outer_grouping down into two functions apiece, since they
currently combine creation of plan_params items with creation of
the referencing Param nodes.

Anybody have a preference?

            regards, tom lane