Обсуждение: merge semi join cost calculation error
Hi
I try to understand to a issue https://stackoverflow.com/questions/52685384/subquery-performance-on-simple-case
The user sent a plan:
QUERY PLAN
Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual time=0.503..9557.396 rows=721 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
-> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19 rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089 rows=28 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual time=0.013..0.020 rows=28 loops=1)
Filter: (users_id_from = 1)
He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge Semi Join because subnode has very high cost 8378397.
I cannot to emulate this case on my comp - so it looks like maybe some build error. What do you think about?
Regards
Pavel
Pavel Stehule <pavel.stehule@gmail.com> writes: > The user sent a plan: > QUERY PLAN > Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual > time=0.503..9557.396 rows=721 loops=1) > Merge Cond: (tips.users_id = follows.users_id_to) > -> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19 > rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1) > -> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089 > rows=28 loops=1) > Sort Key: follows.users_id_to > Sort Method: quicksort Memory: 26kB > -> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual > time=0.013..0.020 rows=28 loops=1) > Filter: (users_id_from = 1) > He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge > Semi Join because subnode has very high cost 8378397. The planner seems to be supposing that the merge will stop far short of scanning the entire LHS table, presumably as a result of thinking that the maximum value of follows.users_id_to is much less than the maximum value of tips.users_id. Given the actual rowcounts, that's seemingly not true, which suggests out-of-date stats for one table or the other. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > The user sent a plan: > QUERY PLAN > Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual > time=0.503..9557.396 rows=721 loops=1) > Merge Cond: (tips.users_id = follows.users_id_to) > -> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19 > rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1) > -> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089 > rows=28 loops=1) > Sort Key: follows.users_id_to > Sort Method: quicksort Memory: 26kB > -> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual > time=0.013..0.020 rows=28 loops=1) > Filter: (users_id_from = 1) > He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge > Semi Join because subnode has very high cost 8378397. The planner seems to be supposing that the merge will stop far short of scanning the entire LHS table, presumably as a result of thinking that the maximum value of follows.users_id_to is much less than the maximum value of tips.users_id. Given the actual rowcounts, that's seemingly not true, which suggests out-of-date stats for one table or the other. regards, tom lane
po 8. 10. 2018 v 17:00 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> The user sent a plan:
> QUERY PLAN
> Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
> time=0.503..9557.396 rows=721 loops=1)
> Merge Cond: (tips.users_id = follows.users_id_to)
> -> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19
> rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
> -> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
> rows=28 loops=1)
> Sort Key: follows.users_id_to
> Sort Method: quicksort Memory: 26kB
> -> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual
> time=0.013..0.020 rows=28 loops=1)
> Filter: (users_id_from = 1)
> He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge
> Semi Join because subnode has very high cost 8378397.
The planner seems to be supposing that the merge will stop far short of
scanning the entire LHS table, presumably as a result of thinking that
the maximum value of follows.users_id_to is much less than the maximum
value of tips.users_id. Given the actual rowcounts, that's seemingly
not true, which suggests out-of-date stats for one table or the other.
good tip - the table follows was too small for autovacuum, and it was terrible effect. I didn't know about this optimization.
Thank you
Pavel
regards, tom lane
po 8. 10. 2018 v 17:00 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> The user sent a plan:
> QUERY PLAN
> Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
> time=0.503..9557.396 rows=721 loops=1)
> Merge Cond: (tips.users_id = follows.users_id_to)
> -> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19
> rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
> -> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
> rows=28 loops=1)
> Sort Key: follows.users_id_to
> Sort Method: quicksort Memory: 26kB
> -> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual
> time=0.013..0.020 rows=28 loops=1)
> Filter: (users_id_from = 1)
> He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge
> Semi Join because subnode has very high cost 8378397.
The planner seems to be supposing that the merge will stop far short of
scanning the entire LHS table, presumably as a result of thinking that
the maximum value of follows.users_id_to is much less than the maximum
value of tips.users_id. Given the actual rowcounts, that's seemingly
not true, which suggests out-of-date stats for one table or the other.
good tip - the table follows was too small for autovacuum, and it was terrible effect. I didn't know about this optimization.
Thank you
Pavel
regards, tom lane