Re: JOINs ... how I hate them ...
От | Tom Lane |
---|---|
Тема | Re: JOINs ... how I hate them ... |
Дата | |
Msg-id | 20120.1013124770@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | JOINs ... how I hate them ... ("Marc G. Fournier" <scrappy@hub.org>) |
Список | pgsql-hackers |
"Marc G. Fournier" <scrappy@hub.org> writes: > explain SELECT p.uid, p.handle > FROM gender_f pgf JOIN profiles p ON (pgf.uid = p.uid) ; > Which explains out as: > Hash Join (cost=1354.80..45297.83 rows=75664 width=27) > -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19) > -> Hash (cost=1165.64..1165.64 rows=75664 width=8) > -> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8) > Now, profiles has uid as its primary KEY, and there are no > duplicates in gender_f ... so, as my HashJoin points out, I should have 75664 > results returned ... that is expected ... and the SeqScan on gender_f is > expected ... but the SeqScan on profiles is what I would hope to get rid > of ... Um, why? Looks like a perfectly reasonable plan to me. > get uid from gender_f, find corresponding entry in profiles ... I'm not convinced that 75000 indexscan probes would be faster than a sequential scan across that table. You could probably force the issue with "set enable_hashjoin to off" (and maybe also "set enable_mergejoin to off") and then see what the plan is and what the actual timing is. (EXPLAIN ANALYZE should be real helpful here.) > explain SELECT p.uid, p.handle > FROM ( orientation_c poc JOIN gender_f pgf USING ( uid ) ) JOIN profiles p ON (pgf.uid = p.uid) ; > Hash Join (cost=6023.92..47537.10 rows=75664 width=35) > -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19) > -> Hash (cost=5834.76..5834.76 rows=75664 width=16) > -> Merge Join (cost=0.00..5834.76 rows=75664 width=16) > -> Index Scan using poc_uid on orientation_c poc (cost=0.00..2807.82 rows=126477 width=8) > -> Index Scan using pgf_uid on gender_f pgf (cost=0.00..1575.79 rows=75664 width=8) > The MergeJoin between poc/pgf will only return 12000 records, and since it > is a 1:1 relationship between each of those tables, there will *only* be > 12000 records pulled from profiles ... Hmm, it thinks that there will be 75664 not 12000 records out of that join. Why the discrepancy? Could we see the pg_stats data for these tables? > ... but, again, for each of > *those* 12000 records, its doing a SeqScan on last_login's 485k records, > instead of using the index No, certainly *not* "for each record". It's a hash join, so it only reads each table once. > Now, if I 'set enable_seqscan=false;' and do the exact same explain, it > definitely comes more in line with what I'd like to see, as far as index > usage is concerned: And what's the actual runtime come out to be? We definitely should standardize on asking for EXPLAIN ANALYZE results in bad-plan discussions, now that we have that capability. regards, tom lane
В списке pgsql-hackers по дате отправления: