Hello,
FULL JOIN requiring merge-joinable conditions has been covered previously
in http://archives.postgresql.org/pgsql-sql/2006-01/msg00080.php for a
different application; here is a new context in which FULL JOIN in
non-merge-joinable conditions might be useful.
In this example, events have a start_time and an end_time. I want a
time-aligned listing of all events from two tables A and B, joined up
where there is any time overlap between A's events with B's events. Here
are the two tables:
CREATE TABLE public.events_a AS
SELECT 'A1' as event_a, '2009-01-03'::date as start_a,
'2009-01-03'::date as end_a
UNION SELECT 'A2' , '2009-01-07'::date ,
'2009-01-08'::date
UNION SELECT 'A3' , '2009-01-14'::date ,
'2009-01-15'::date
;
CREATE TABLE public.events_b AS
SELECT 'B1' as event_b, '2009-01-04'::date as start_b,
'2009-01-06'::date as end_b
UNION SELECT 'B2' , '2009-01-07'::date ,
'2009-01-12'::date
UNION SELECT 'B3' , '2009-01-13'::date ,
'2009-01-13'::date
;
A FULL JOIN would be good because presumably it would require only two
sequential scans, but it appears the OVERLAPS operator creates a
non-merge-joinable condition; this query returns ERROR: FULL JOIN is only
supported with merge-joinable join conditions / SQL state: 0A000:
SELECT *
FROM public.events_a A
FULL JOIN public.events_b B
ON (start_a, end_a) OVERLAPS (start_b, end_b)
ORDER BY coalesce(start_a, start_b), start_b ;
As noted in other posts to the archive listing above, the answer I am
looking for can be had by UNIONing a LEFT join and a RIGHT JOIN:
SELECT *
FROM
(
SELECT *
FROM public.events_a A
LEFT JOIN public.events_b B
ON (start_a, end_a) OVERLAPS (start_b, end_b)
UNION SELECT *
FROM public.events_a A
RIGHT OUTER JOIN public.events_b B
ON (start_a, end_a) OVERLAPS (start_b, end_b)
) U
ORDER BY COALESCE(start_a, start_b), start_b;
but this requires 4 sequential scans:
EXPLAIN above query:
QUERY PLAN
Sort (cost=298586.19..300828.86 rows=897066 width=80)
Sort Key: (COALESCE(u.start_a, u.start_b)), u.start_b
-> Subquery Scan u (cost=185220.13..209889.44 rows=897066 width=80)
-> Unique (cost=185220.13..200918.78 rows=897066 width=80)
-> Sort (cost=185220.13..187462.79 rows=897066 width=80)
Sort Key: a.event_a, a.start_a, a.end_a, b.event_b,
b.start_b, b.end_b
-> Append (cost=22.76..96523.38 rows=897066
width=80)
-> Nested Loop Left Join (cost=22.76..43776.36
rows=448533 width=80)
Join Filter:
"overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp
with time zone, (b.start_b)::timestamp with time zone,
(b.end_b)::timestamp with time zone)
-> Seq Scan on events_a a
(cost=0.00..21.60 rows=1160 width=40)
-> Materialize (cost=22.76..34.36
rows=1160 width=40)
-> Seq Scan on events_b b
(cost=0.00..21.60 rows=1160 width=40)
-> Nested Loop Left Join (cost=22.76..43776.36
rows=448533 width=80)
Join Filter:
"overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp
with time zone, (b.start_b)::timestamp with time zone,
(b.end_b)::timestamp with time zone)
-> Seq Scan on events_b b
(cost=0.00..21.60 rows=1160 width=40)
-> Materialize (cost=22.76..34.36
rows=1160 width=40)
-> Seq Scan on events_a a
(cost=0.00..21.60 rows=1160 width=40)
In practice, is there:
* A different structuring the time spans of the 'events' such that a
merge-joinable condition can be found?
* A possibility of teaching Hash Join to do the FULL JOIN on
non-merge-joinable conditions?
Thanks,
John Koerber
Senior Systems Developer / Analyst