Обсуждение: [GENERAL] Bug in postgres 9.6.2?

Поиск
Список
Период
Сортировка

[GENERAL] Bug in postgres 9.6.2?

От
greigwise
Дата:
So, I have this query with nothing non-deterministic in it, yet I can run it
multiple times and get different results in postgres 9.6.2:

with test as (
select g.id2
from staging a
join dim_1 e on e.id1 = a.id1
  and a.created_at >= e.effective_at
  and a.created_at < e.expired_at
join dim_2 g on g.id2 = a.id2
  and a.created_at >= g.effective_at
  and a.created_at < g.expired_at
left join dim_3 i on i.text_field = g.text_field
join dim_4 h on h.id4 = a.id4
  and a.created_at >= h.effective_at
  and a.created_at < h.expired_at)

select count(*) from test;

Now if I rework this query slightly, it produces a consistent result:

select count(*) from
(
select g.id2
from staging a
join dim_1 e on e.id1 = a.id1
  and a.created_at >= e.effective_at
  and a.created_at < e.expired_at
join dim_2 g on g.id2 = a.id2
  and a.created_at >= g.effective_at
  and a.created_at < g.expired_at
left join dim_3 i on i.text_field = g.text_field
join dim_4 h on h.id4 = a.id4
  and a.created_at >= h.effective_at
  and a.created_at < h.expired_at) as test;

Furthermore, if I run the first query on postgres 9.6.3 rather than 9.6.2,
it also seems to produce a consistent result.

Is it possible that this is a bug in 9.6.2 that was fixed in 9.6.3?  Or is
it more likely that somehow just restarting the instance (after doing the
point release upgrade) is what fixed this issue.

Noteworthy is that all the "created_at", "effective_at" and "expired_at"
fields are all timestamp without time zone. All the id's are integers.

Thanks in advance,
Greig Wise



--
View this message in context: http://www.postgresql-archive.org/Bug-in-postgres-9-6-2-tp5972185.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: [GENERAL] Bug in postgres 9.6.2?

От
Tom Lane
Дата:
greigwise <greigwise@comcast.net> writes:
> So, I have this query with nothing non-deterministic in it, yet I can run it
> multiple times and get different results in postgres 9.6.2:
> ...
> Is it possible that this is a bug in 9.6.2 that was fixed in 9.6.3?

Hard to tell without more information than this --- which I realize you
may not be able to get if the 9.6.2 instance is now gone.  If it isn't,
it'd be interesting to compare EXPLAIN ANALYZE results, for starters.
I will say that I don't see anything in the 9.6.3 release notes that
looks like it would result in making a nondeterministic query result
deterministic.

            regards, tom lane


Re: [GENERAL] Bug in postgres 9.6.2?

От
greigwise
Дата:
I do still have the 9.6.2 instance.

Here is the explain analyze results from the 9.6.2 instance:

 Aggregate  (cost=764612.56..764612.57 rows=1 width=8) (actual
time=695.166..695.166 rows=1 loops=1)
   CTE test
     ->  Nested Loop  (cost=3345.90..757594.63 rows=311908 width=4) (actual
time=76.936..632.852 rows=73500 loops=1)
           ->  Hash Left Join  (cost=3345.47..406121.74 rows=83289 width=16)
(actual time=76.920..264.217 rows=73500 loops=1)
                 Hash Cond: ((g.text_field)::text = (i.text_field)::text)
                 ->  Gather  (cost=3333.89..404626.67 rows=333156 width=30)
(actual time=76.877..219.843 rows=73500 loops=1)
                       Workers Planned: 4
                       Workers Launched: 4
                       ->  Nested Loop  (cost=2333.89..370311.07 rows=83289
width=30) (actual time=74.266..408.763 rows=14700 loops=5)
                             ->  Hash Join  (cost=2333.33..215520.29
rows=24240 width=16) (actual time=74.226..295.244 rows=14652 loops=5)
                                   Hash Cond: (e.id1 = a.id1)
                                   Join Filter: ((a.created_at >=
e.effective_at) AND (a.created_at < e.expired_at))
                                   Rows Removed by Join Filter: 66412
                                   ->  Parallel Seq Scan on dim_1 e
(cost=0.00..133595.06 rows=153406 width=20) (actual time=0.029..137.850
rows=141713 loops=5)
                                   ->  Hash  (cost=1417.59..1417.59
rows=73259 width=20) (actual time=73.284..73.284 rows=73259 loops=5)
                                         Buckets: 131072  Batches: 1  Memory
Usage: 5031kB
                                         ->  Seq Scan on staging a
(cost=0.00..1417.59 rows=73259 width=20) (actual time=0.016..36.003
rows=73259 loops=5)
                             ->  Index Scan using dim_2_id on dim_2 g
(cost=0.56..6.36 rows=3 width=38) (actual time=0.006..0.007 rows=1
loops=73259)
                                   Index Cond: ((id2 = a.id2) AND
(a.created_at >= effective_at))
                                   Filter: (a.created_at < expired_at)
                                   Rows Removed by Filter: 1
                 ->  Hash  (cost=10.70..10.70 rows=70 width=516) (actual
time=0.028..0.028 rows=27 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 10kB
                       ->  Seq Scan on dim_3 i  (cost=0.00..10.70 rows=70
width=516) (actual time=0.005..0.015 rows=27 loops=1)
           ->  Index Scan using dim_4_id on dim_4 h  (cost=0.43..4.18 rows=4
width=20) (actual time=0.004..0.004 rows=1 loops=73500)
                 Index Cond: ((id4 = a.id4) AND (a.created_at >=
effective_at))
                 Filter: (a.created_at < expired_at)
                 Rows Removed by Filter: 0
   ->  CTE Scan on test  (cost=0.00..6238.16 rows=311908 width=0) (actual
time=76.938..675.796 rows=73500 loops=1)
 Planning time: 0.618 ms
 Execution time: 701.594 ms

And here is the result from 9.6.3:

 Aggregate  (cost=836957.12..836957.13 rows=1 width=8) (actual
time=803.091..803.091 rows=1 loops=1)
   CTE test
     ->  Gather  (cost=3335.93..815257.26 rows=964438 width=4) (actual
time=80.394..713.993 rows=73259 loops=1)
           Workers Planned: 4
           Workers Launched: 4
           ->  Nested Loop  (cost=2335.93..717813.46 rows=241110 width=4)
(actual time=99.061..755.383 rows=14652 loops=5)
                 ->  Hash Left Join  (cost=2335.50..333199.71 rows=68143
width=16) (actual time=99.025..579.332 rows=14652 loops=5)
                       Hash Cond: ((g.text_field)::text =
(i.text_field)::text)
                       ->  Nested Loop  (cost=2333.89..332924.06 rows=68143
width=30) (actual time=98.905..562.756 rows=14652 loops=5)
                             ->  Hash Join  (cost=2333.33..196015.85
rows=21048 width=16) (actual time=98.858..369.890 rows=14652 loops=5)
                                   Hash Cond: (e.id1 = a.id1)
                                   Join Filter: ((a.created_at >=
e.effective_at) AND (a.created_at < e.expired_at))
                                   Rows Removed by Join Filter: 66412
                                   ->  Parallel Seq Scan on dim_1 e
(cost=0.00..91462.65 rows=177066 width=20) (actual time=0.032..165.829
rows=141713 loops=5)
                                   ->  Hash  (cost=1417.59..1417.59
rows=73259 width=20) (actual time=97.828..97.828 rows=73259 loops=5)
                                         Buckets: 131072  Batches: 1  Memory
Usage: 5031kB
                                         ->  Seq Scan on staging a
(cost=0.00..1417.59 rows=73259 width=20) (actual time=0.019..44.612
rows=73259 loops=5)
                             ->  Index Scan using dim_2_id on dim_2 g
(cost=0.56..6.47 rows=3 width=38) (actual time=0.011..0.011 rows=1
loops=73259)
                                   Index Cond: ((id2 = a.id2) AND
(a.created_at >= effective_at))
                                   Filter: (a.created_at < expired_at)
                                   Rows Removed by Filter: 1
                       ->  Hash  (cost=1.27..1.27 rows=27 width=516) (actual
time=0.053..0.053 rows=27 loops=5)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on dim_3 i  (cost=0.00..1.27
rows=27 width=516) (actual time=0.023..0.034 rows=27 loops=5)
                 ->  Index Scan using dim_4_id on dim_4 h  (cost=0.43..5.60
rows=4 width=20) (actual time=0.010..0.010 rows=1 loops=73259)
                       Index Cond: ((id4 = a.id4) AND (a.created_at >=
effective_at))
                       Filter: (a.created_at < expired_at)
                       Rows Removed by Filter: 0
   ->  CTE Scan on test  (cost=0.00..19288.76 rows=964438 width=0) (actual
time=80.397..770.944 rows=73259 loops=1)
 Planning time: 0.756 ms
 Execution time: 815.729 ms

Thanks again for any help.
Greig



--
View this message in context: http://www.postgresql-archive.org/Bug-in-postgres-9-6-2-tp5972185p5972217.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: [GENERAL] Bug in postgres 9.6.2?

От
Tom Lane
Дата:
greigwise <greigwise@comcast.net> writes:
> I do still have the 9.6.2 instance.
> Here is the explain analyze results from the 9.6.2 instance:

Oh ... you have parallel query enabled, eh?  In that case I'm less
surprised, there were multiple parallel-query fixes since 9.6.2.
Although those still don't look like they would result in actually
wrong query answers.  I wouldn't be surprised if whatever you're
seeing is still there, and you're just escaping it by luck in 9.6.3.

If you run the query multiple times on 9.6.2, exactly where do the
"actual" row counts change?  Can you extract a test case you wouldn't
mind publishing?

            regards, tom lane


Re: [GENERAL] Bug in postgres 9.6.2?

От
greigwise
Дата:
If I can provide a pg_dump backup with a db where I can reproduce the error
and then also my postgresql.conf along with the query, would that be what
you need for a test case?

Thanks,
Greig



--
View this message in context: http://www.postgresql-archive.org/Bug-in-postgres-9-6-2-tp5972185p5972450.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: [GENERAL] Bug in postgres 9.6.2?

От
Tom Lane
Дата:
greigwise <greigwise@comcast.net> writes:
> If I can provide a pg_dump backup with a db where I can reproduce the error
> and then also my postgresql.conf along with the query, would that be what
> you need for a test case?

Sounds like enough.

            regards, tom lane