Обсуждение: [GENERAL] Bug in postgres 9.6.2?
So, I have this query with nothing non-deterministic in it, yet I can run it multiple times and get different results in postgres 9.6.2: with test as ( select g.id2 from staging a join dim_1 e on e.id1 = a.id1 and a.created_at >= e.effective_at and a.created_at < e.expired_at join dim_2 g on g.id2 = a.id2 and a.created_at >= g.effective_at and a.created_at < g.expired_at left join dim_3 i on i.text_field = g.text_field join dim_4 h on h.id4 = a.id4 and a.created_at >= h.effective_at and a.created_at < h.expired_at) select count(*) from test; Now if I rework this query slightly, it produces a consistent result: select count(*) from ( select g.id2 from staging a join dim_1 e on e.id1 = a.id1 and a.created_at >= e.effective_at and a.created_at < e.expired_at join dim_2 g on g.id2 = a.id2 and a.created_at >= g.effective_at and a.created_at < g.expired_at left join dim_3 i on i.text_field = g.text_field join dim_4 h on h.id4 = a.id4 and a.created_at >= h.effective_at and a.created_at < h.expired_at) as test; Furthermore, if I run the first query on postgres 9.6.3 rather than 9.6.2, it also seems to produce a consistent result. Is it possible that this is a bug in 9.6.2 that was fixed in 9.6.3? Or is it more likely that somehow just restarting the instance (after doing the point release upgrade) is what fixed this issue. Noteworthy is that all the "created_at", "effective_at" and "expired_at" fields are all timestamp without time zone. All the id's are integers. Thanks in advance, Greig Wise -- View this message in context: http://www.postgresql-archive.org/Bug-in-postgres-9-6-2-tp5972185.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
greigwise <greigwise@comcast.net> writes: > So, I have this query with nothing non-deterministic in it, yet I can run it > multiple times and get different results in postgres 9.6.2: > ... > Is it possible that this is a bug in 9.6.2 that was fixed in 9.6.3? Hard to tell without more information than this --- which I realize you may not be able to get if the 9.6.2 instance is now gone. If it isn't, it'd be interesting to compare EXPLAIN ANALYZE results, for starters. I will say that I don't see anything in the 9.6.3 release notes that looks like it would result in making a nondeterministic query result deterministic. regards, tom lane
I do still have the 9.6.2 instance. Here is the explain analyze results from the 9.6.2 instance: Aggregate (cost=764612.56..764612.57 rows=1 width=8) (actual time=695.166..695.166 rows=1 loops=1) CTE test -> Nested Loop (cost=3345.90..757594.63 rows=311908 width=4) (actual time=76.936..632.852 rows=73500 loops=1) -> Hash Left Join (cost=3345.47..406121.74 rows=83289 width=16) (actual time=76.920..264.217 rows=73500 loops=1) Hash Cond: ((g.text_field)::text = (i.text_field)::text) -> Gather (cost=3333.89..404626.67 rows=333156 width=30) (actual time=76.877..219.843 rows=73500 loops=1) Workers Planned: 4 Workers Launched: 4 -> Nested Loop (cost=2333.89..370311.07 rows=83289 width=30) (actual time=74.266..408.763 rows=14700 loops=5) -> Hash Join (cost=2333.33..215520.29 rows=24240 width=16) (actual time=74.226..295.244 rows=14652 loops=5) Hash Cond: (e.id1 = a.id1) Join Filter: ((a.created_at >= e.effective_at) AND (a.created_at < e.expired_at)) Rows Removed by Join Filter: 66412 -> Parallel Seq Scan on dim_1 e (cost=0.00..133595.06 rows=153406 width=20) (actual time=0.029..137.850 rows=141713 loops=5) -> Hash (cost=1417.59..1417.59 rows=73259 width=20) (actual time=73.284..73.284 rows=73259 loops=5) Buckets: 131072 Batches: 1 Memory Usage: 5031kB -> Seq Scan on staging a (cost=0.00..1417.59 rows=73259 width=20) (actual time=0.016..36.003 rows=73259 loops=5) -> Index Scan using dim_2_id on dim_2 g (cost=0.56..6.36 rows=3 width=38) (actual time=0.006..0.007 rows=1 loops=73259) Index Cond: ((id2 = a.id2) AND (a.created_at >= effective_at)) Filter: (a.created_at < expired_at) Rows Removed by Filter: 1 -> Hash (cost=10.70..10.70 rows=70 width=516) (actual time=0.028..0.028 rows=27 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 10kB -> Seq Scan on dim_3 i (cost=0.00..10.70 rows=70 width=516) (actual time=0.005..0.015 rows=27 loops=1) -> Index Scan using dim_4_id on dim_4 h (cost=0.43..4.18 rows=4 width=20) (actual time=0.004..0.004 rows=1 loops=73500) Index Cond: ((id4 = a.id4) AND (a.created_at >= effective_at)) Filter: (a.created_at < expired_at) Rows Removed by Filter: 0 -> CTE Scan on test (cost=0.00..6238.16 rows=311908 width=0) (actual time=76.938..675.796 rows=73500 loops=1) Planning time: 0.618 ms Execution time: 701.594 ms And here is the result from 9.6.3: Aggregate (cost=836957.12..836957.13 rows=1 width=8) (actual time=803.091..803.091 rows=1 loops=1) CTE test -> Gather (cost=3335.93..815257.26 rows=964438 width=4) (actual time=80.394..713.993 rows=73259 loops=1) Workers Planned: 4 Workers Launched: 4 -> Nested Loop (cost=2335.93..717813.46 rows=241110 width=4) (actual time=99.061..755.383 rows=14652 loops=5) -> Hash Left Join (cost=2335.50..333199.71 rows=68143 width=16) (actual time=99.025..579.332 rows=14652 loops=5) Hash Cond: ((g.text_field)::text = (i.text_field)::text) -> Nested Loop (cost=2333.89..332924.06 rows=68143 width=30) (actual time=98.905..562.756 rows=14652 loops=5) -> Hash Join (cost=2333.33..196015.85 rows=21048 width=16) (actual time=98.858..369.890 rows=14652 loops=5) Hash Cond: (e.id1 = a.id1) Join Filter: ((a.created_at >= e.effective_at) AND (a.created_at < e.expired_at)) Rows Removed by Join Filter: 66412 -> Parallel Seq Scan on dim_1 e (cost=0.00..91462.65 rows=177066 width=20) (actual time=0.032..165.829 rows=141713 loops=5) -> Hash (cost=1417.59..1417.59 rows=73259 width=20) (actual time=97.828..97.828 rows=73259 loops=5) Buckets: 131072 Batches: 1 Memory Usage: 5031kB -> Seq Scan on staging a (cost=0.00..1417.59 rows=73259 width=20) (actual time=0.019..44.612 rows=73259 loops=5) -> Index Scan using dim_2_id on dim_2 g (cost=0.56..6.47 rows=3 width=38) (actual time=0.011..0.011 rows=1 loops=73259) Index Cond: ((id2 = a.id2) AND (a.created_at >= effective_at)) Filter: (a.created_at < expired_at) Rows Removed by Filter: 1 -> Hash (cost=1.27..1.27 rows=27 width=516) (actual time=0.053..0.053 rows=27 loops=5) Buckets: 1024 Batches: 1 Memory Usage: 10kB -> Seq Scan on dim_3 i (cost=0.00..1.27 rows=27 width=516) (actual time=0.023..0.034 rows=27 loops=5) -> Index Scan using dim_4_id on dim_4 h (cost=0.43..5.60 rows=4 width=20) (actual time=0.010..0.010 rows=1 loops=73259) Index Cond: ((id4 = a.id4) AND (a.created_at >= effective_at)) Filter: (a.created_at < expired_at) Rows Removed by Filter: 0 -> CTE Scan on test (cost=0.00..19288.76 rows=964438 width=0) (actual time=80.397..770.944 rows=73259 loops=1) Planning time: 0.756 ms Execution time: 815.729 ms Thanks again for any help. Greig -- View this message in context: http://www.postgresql-archive.org/Bug-in-postgres-9-6-2-tp5972185p5972217.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
greigwise <greigwise@comcast.net> writes: > I do still have the 9.6.2 instance. > Here is the explain analyze results from the 9.6.2 instance: Oh ... you have parallel query enabled, eh? In that case I'm less surprised, there were multiple parallel-query fixes since 9.6.2. Although those still don't look like they would result in actually wrong query answers. I wouldn't be surprised if whatever you're seeing is still there, and you're just escaping it by luck in 9.6.3. If you run the query multiple times on 9.6.2, exactly where do the "actual" row counts change? Can you extract a test case you wouldn't mind publishing? regards, tom lane
If I can provide a pg_dump backup with a db where I can reproduce the error and then also my postgresql.conf along with the query, would that be what you need for a test case? Thanks, Greig -- View this message in context: http://www.postgresql-archive.org/Bug-in-postgres-9-6-2-tp5972185p5972450.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
greigwise <greigwise@comcast.net> writes: > If I can provide a pg_dump backup with a db where I can reproduce the error > and then also my postgresql.conf along with the query, would that be what > you need for a test case? Sounds like enough. regards, tom lane