Обсуждение: Difference between ON and WHERE in JOINs

Поиск
Список
Период
Сортировка

Difference between ON and WHERE in JOINs

От
Jean-Christophe Boggio
Дата:
I'm looking for an article that explains the difference between these
constructs IN POSTGRESQL (the rules seem to differ from one DB to another) :

SELECT A.*
FROM A
JOIN B ON a.id=b.id AND A.somefield='somevalue'

and

SELECT A.*
FROM A
JOIN B ON a.id=b.id
WHERE A.somefield='somevalue'


I have noticed big differences though I don't know the rules and I've
been bitten several times recently. Time to learn.

Thanks,

JC


Re: Difference between ON and WHERE in JOINs

От
David Johnston
Дата:
On Sep 18, 2012, at 20:21, Jean-Christophe Boggio <postgresql@thefreecat.org> wrote:

> I'm looking for an article that explains the difference between these constructs IN POSTGRESQL (the rules seem to
differfrom one DB to another) : 
>
> SELECT A.*
> FROM A
> JOIN B ON a.id=b.id AND A.somefield='somevalue'
>
> and
>
> SELECT A.*
> FROM A
> JOIN B ON a.id=b.id
> WHERE A.somefield='somevalue'
>
>
> I have noticed big differences though I don't know the rules and I've been bitten several times recently. Time to
learn.
>
> Thanks,
>
> JC
>

There is no difference in your example.  Conceptually though I suggest using only table-table conditions in an ON
clauseand placing any table-value conditions into the where. 

The main time you get differences is when you use OUTER JOIN constructions since the order of filtering can affect the
finalresult.  With an inner join the order of evaluation doesn't matter since all valid results will have a record from
bothsides of the join. 

This really shouldn't be platform specific as it is the core of SQL standard.  If you want to actually show examples
with"big differences" maybe someone can explain the reason.  Otherwise the documentation is excellent to explore what
syntaxis available in PostgreSQL.  The SELECT SQL command is the defining location. 

David J.



Re: Difference between ON and WHERE in JOINs

От
Merlin Moncure
Дата:
On Tue, Sep 18, 2012 at 7:47 PM, David Johnston <polobo@yahoo.com> wrote:
> On Sep 18, 2012, at 20:21, Jean-Christophe Boggio <postgresql@thefreecat.org> wrote:
>
>> I'm looking for an article that explains the difference between these constructs IN POSTGRESQL (the rules seem to
differfrom one DB to another) : 
>>
>> SELECT A.*
>> FROM A
>> JOIN B ON a.id=b.id AND A.somefield='somevalue'
>>
>> and
>>
>> SELECT A.*
>> FROM A
>> JOIN B ON a.id=b.id
>> WHERE A.somefield='somevalue'
>>
>>
>> I have noticed big differences though I don't know the rules and I've been bitten several times recently. Time to
learn.
>>
>> Thanks,
>>
>> JC
>>
>
> There is no difference in your example.  Conceptually though I suggest using only table-table conditions in an ON
clauseand placing any table-value conditions into the where. 
>
> The main time you get differences is when you use OUTER JOIN constructions since the order of filtering can affect
thefinal result.  With an inner join the order of evaluation doesn't matter since all valid results will have a record
fromboth sides of the join. 
>
> This really shouldn't be platform specific as it is the core of SQL standard.  If you want to actually show examples
with"big differences" maybe someone can explain the reason.  Otherwise the documentation is excellent to explore what
syntaxis available in PostgreSQL.  The SELECT SQL command is the defining location. 

Yeah.  This comes up most often with left joins.  It's the source of
the #1 bug I see in SQL -- it trips up even the experts sometimes.

SELECT * FROM foo LEFT JOIN BAR ON foo.id = bar.id WHERE bar.col = 'something';

By having the filtering in the where clause, the intended purpose of
the left join, to return every row of foo, is being defeated and the
join will behave like an inner join.  The right way to do it is:

SELECT * FROM foo LEFT JOIN BAR ON foo.id = bar.id AND bar.col = 'something';

The difference here is that the filtering is now happening at join
time where the left join semantics are playing: always return foo and
return bar rows if and only if the join condition is met.

merlin


Re: Difference between ON and WHERE in JOINs

От
Jean-Christophe Boggio
Дата:
Le 19/09/2012 02:47, David Johnston a écrit :
> There is no difference in your example.  Conceptually though I
> suggest using only table-table conditions in an ON clause and placing
> any table-value conditions into the where.

This is how I use it usually.

> The main time you get differences is when you use OUTER JOIN
> constructions since the order of filtering can affect the final
> result.  With an inner join the order of evaluation doesn't matter
> since all valid results will have a record from both sides of the
> join.

Ok, I didn't know what the "trigger" was : outer joins, ok.

I have this query working :
select profil,count(og.name)
from ldap l
left join uidinoldgroups ug on l.uid=ug.uid
left join oldgroups og on og.id=ug.idgroupe and og.rne='0410030k' and og.type='g'
where l.profilgeneral='P'
and l.rne='0410030k'
group by l.profil

But if I put :
    and og.rne='0410030k' and og.type='g'
in the where part (what you suggested and what I did naturally), I get 0 results which is quite confusing (because it's
anouter join). 

Also, if I replace these with full outer joins, I still get 0 results.

With my data, if I replace og.type='g' with og.type='m' I get MOST OF my resultats back (those where the count()
returnsmore than zero). I know this is specific to my data but I really don't get the behaviour. 

Thanks for your help,

JC


Re: Difference between ON and WHERE in JOINs

От
Jean-Christophe Boggio
Дата:
Le 19/09/2012 17:08, Merlin Moncure a écrit :
> SELECT * FROM foo LEFT JOIN BAR ON foo.id = bar.id AND bar.col = 'something';
>
> The difference here is that the filtering is now happening at join
> time where the left join semantics are playing: always return foo and
> return bar rows if and only if the join condition is met.

Ok, as a rule of thumb, should I put in the where clause only the
conditions related to foo and inner joins ? All other (outer) conditions
should then go on their respective 'ON' clauses ?

JC


Re: Difference between ON and WHERE in JOINs

От
"Kevin Grittner"
Дата:
Merlin Moncure <mmoncure@gmail.com> wrote:

> By having the filtering in the where clause, the intended purpose
> of the left join, to return every row of foo, is being defeated
> and the join will behave like an inner join.  The right way to do
> it is:
>
> SELECT * FROM foo LEFT JOIN BAR ON foo.id = bar.id AND bar.col =
> 'something';

Consider the hypothetical case of a person table which is vertically
partitioned to have less frequently used information stored in a
demographics table.  The goal is to keep the person table narrow, so
that common uses (which don't reference any of the demographics
information) can be faster.  The demographics row is only present if
one or more of the non-key values is not null.  Let's say you want
to do a list which only includes people not known to be dead.  If
you put the `demographics.date_of_death IS NULL` test in the ON
clause, in a manner similar to your above example, you will get
incorrect results.

I will concede that most of the time you want conditions related to
the right-hand table of a left join in the ON clause; but that's not
always the right thing to do.  The question is: "Is this a condition
which should control whether data from the optional table is
included, or is this a condition which should determine whether the
joined data row as a whole is included in the result?"

-Kevin


Re: Difference between ON and WHERE in JOINs

От
"David Johnston"
Дата:
> I have this query working :
> select profil,count(og.name)
> from ldap l
> left join uidinoldgroups ug on l.uid=ug.uid left join oldgroups og on
> og.id=ug.idgroupe and og.rne='0410030k' and og.type='g'
> where l.profilgeneral='P'
> and l.rne='0410030k'
> group by l.profil
>
> But if I put :
>     and og.rne='0410030k' and og.type='g'
> in the where part (what you suggested and what I did naturally), I get 0
> results which is quite confusing (because it's an outer join).
>
> Also, if I replace these with full outer joins, I still get 0 results.
>
> With my data, if I replace og.type='g' with og.type='m' I get MOST OF my
> resultats back (those where the count() returns more than zero). I know
this
> is specific to my data but I really don't get the behaviour.
>
> Thanks for your help,
>
> JC
>

The query you wrote is equivalent to this:

SELECT profil, count(og.name)
FROM ldap l
LEFT JOIN ... ug ON (l.uid = ug.uid)
LEFT JOIN (SELECT * FROM ... WHERE rne = '...' AND type = '...') og ON
(og.id = ug.idgroupe)
WHERE l.rne = '' AND l.type=''

As soon as you start putting the right-hand tables in the outer-most where
clause you have to deal with the fact that the outer join can cause those
columns to be null.  If you compare those columns to a literal value then
you are saying that you don't want that column to be NULL and by extension
you really meant to use an inner join.

If you had put:

WHERE l.rne = '' AND l.type = '' AND (og.type = 'g' OR og.type IS NULL)

That would give you the behavior such that if og.type is a known value AND
it is not 'g' then the item should NOT be counted.  If the value is unknown
or 'g' then count it.

As for a rule-of-thumb I would avoid non-relational conditions in the ON
clause all-together and move them into a sub-query as I showed above.  The
WHERE clause is reserved for the left side of the join hierarchy by default.
The ON clause is reserved for inter-relation comparisons.  When cobbling
together multiple JOINs I would also suggest making use of liberal newlines
as well as parentheses.  Lastly knowing the correct answer is not that
common; just starting building up the query piece-by-piece and keep in mind
that you have to deal with the NULLs introduced by the OUTER JOIN.  How you
deal with them is query specific and cannot be "rule-of-thumbed".

Yes, this is all more verbose but you've now separated the relational filter
and the joining into the designated areas making interpreting the query
easier.  Throw in usage of CTE/WITH and the important portion of the query
can be made succinct by moving the sub-queries and filters to the top and
out of the way.  You can refactor, if necessary, when you are done.

David J.








Re: Difference between ON and WHERE in JOINs

От
Jean-Christophe Boggio
Дата:
David,

Thanks for the verbose explanations, really useful.

However, in my case :
- type is never NULL
- there are no lines from uidinoldgroups that match the "FROM ldap" join.

But I think I got it :

drop table if exists tmpA;
drop table if exists tmpB;
create temp table tmpA (name varchar);
insert into tmpA values ('jack');
insert into tmpA values ('joe');
create temp table tmpB (name varchar, value int);
insert into tmpB values('jack',10);

-- case (1)
select a.name,COUNT(b.*)
from tmpA a
LEFT JOIN tmpB b ON a.name=b.name AND b.value>0
group by a.name

This gives :
Jack 1
Joe 0

But :

-- case (2)
select a.name,COUNT(b.*)
from tmpA a
LEFT JOIN tmpB b ON a.name=b.name
WHERE b.value>0
group by a.name

gives :
Jack 1

No mention of Joe.

Though :

-- case (3)
select a.name,COUNT(b.*)
from tmpA a
LEFT JOIN tmpB b ON a.name=b.name
WHERE (b.value>0 or b.value is null)
group by a.name

Brings back Joe. The WHERE clause is evaluated AFTER the JOIN.

A subtle concept difference that makes big differences in the results.

Many thanks for the enlightenment.

And also for making me look at CTE constructs which I did not know of.
They make things much clearer :

with b2 as (
   select name,value
   from tmpB
   where value>0
)
SELECT a.name, count(b.*)
FROM tmpA a
LEFT JOIN b2 b ON a.name=b.name
GROUP BY a.name

Have a nice day, you made mine rich !


Re: Difference between ON and WHERE in JOINs

От
"David Johnston"
Дата:
Comments embedded between ================

>
> David,
>
> Thanks for the verbose explanations, really useful.
>
> However, in my case :
> - type is never NULL

============================================
Assuming you mean the "og.type" (i.e., the right-side of the join) even
though the column itself is defined NOT NULL if there are no matching
records supplied from that table the column still appears in the "joined
relation" and in that relation the system uses NULL to represent that no
record was provided from the corresponding table.
============================================

> - there are no lines from uidinoldgroups that match the "FROM ldap" join.
>
> But I think I got it :
>
> drop table if exists tmpA;
> drop table if exists tmpB;
> create temp table tmpA (name varchar);
> insert into tmpA values ('jack');
> insert into tmpA values ('joe');
> create temp table tmpB (name varchar, value int); insert into tmpB
> values('jack',10);
>
> -- case (1)
> select a.name,COUNT(b.*)
> from tmpA a
> LEFT JOIN tmpB b ON a.name=b.name AND b.value>0 group by a.name
>
> This gives :
> Jack 1
> Joe 0
>
======================================================================

The only danger here is that count(b.*) evaluates to zero while all the
values in b.*are actually NULL.

Try playing with "ARRAY_AGG()" instead of "COUNT()" to get a better feel for
what is being returned.  Trying adding some more records to A and B then
run:

SELECT name, array_agg(b.value)
FROM tmpA a
LEFT JOIN tmpB b USING (name)
GROUP BY name

=======================================================================
> But :
>
> -- case (2)
> select a.name,COUNT(b.*)
> from tmpA a
> LEFT JOIN tmpB b ON a.name=b.name
> WHERE b.value>0
> group by a.name
>
> gives :
> Jack 1
>
> No mention of Joe.
====================================================================
Correct, because Joe does not have any records on (a LEFT JOIN B) with
"value > 0" which is mandatory.

Basically once you get to the WHERE clause you really do not have distinct
"a" and "b" tables but rather you operate on "a LEFT JOIN b" where any
columns supplied by "b" may be NULL/missing.

Again, the same does apply to INNER JOIN but because no NULLs can be
introduced by an INNER JOIN thinking of "a" and "b" as still being distinct
relations doesn't impact the outcome.

In other words:

SELECT ...
FROM (tmpA LEFT JOIN tmpB USING name) tmpAB

WHERE (tmpAB.value > 0) -- this is what you are really saying in case # 2;
and within tmpAB "value" can be NULL because of the LEFT JOIN

WHERE tmpB.value > 0 -- this is an error; relation tmpB is not visible at
this point
======================================================================
>
> Though :
>
> -- case (3)
> select a.name,COUNT(b.*)
> from tmpA a
> LEFT JOIN tmpB b ON a.name=b.name
> WHERE (b.value>0 or b.value is null)
> group by a.name
>
> Brings back Joe. The WHERE clause is evaluated AFTER the JOIN.
>
> A subtle concept difference that makes big differences in the results.
>
> Many thanks for the enlightenment.
>
> And also for making me look at CTE constructs which I did not know of.
> They make things much clearer :
>
> with b2 as (
>    select name,value
>    from tmpB
>    where value>0
> )
> SELECT a.name, count(b.*)
> FROM tmpA a
> LEFT JOIN b2 b ON a.name=b.name
> GROUP BY a.name
>
> Have a nice day, you made mine rich !
>
========================================================
Glad I could be of assistance!

David J.




Re: Difference between ON and WHERE in JOINs

От
Gavin Flower
Дата:
On 20/09/12 03:08, Merlin Moncure wrote:
On Tue, Sep 18, 2012 at 7:47 PM, David Johnston <polobo@yahoo.com> wrote:
> On Sep 18, 2012, at 20:21, Jean-Christophe Boggio <postgresql@thefreecat.org> wrote:
>
>> I'm looking for an article that explains the difference between these constructs IN POSTGRESQL (the rules seem to differ from one DB to another) :
>>
>> SELECT A.*
>> FROM A
>> JOIN B ON a.id=b.id AND A.somefield='somevalue'
>>
>> and
>>
>> SELECT A.*
>> FROM A
>> JOIN B ON a.id=b.id
>> WHERE A.somefield='somevalue'
>>
>>
>> I have noticed big differences though I don't know the rules and I've been bitten several times recently. Time to learn.
>>
>> Thanks,
>>
>> JC
>>
>
> There is no difference in your example.  Conceptually though I suggest using only table-table conditions in an ON clause and placing any table-value conditions into the where.
>
> The main time you get differences is when you use OUTER JOIN constructions since the order of filtering can affect the final result.  With an inner join the order of evaluation doesn't matter since all valid results will have a record from both sides of the join.
>
> This really shouldn't be platform specific as it is the core of SQL standard.  If you want to actually show examples with "big differences" maybe someone can explain the reason.  Otherwise the documentation is excellent to explore what syntax is available in PostgreSQL.  The SELECT SQL command is the defining location.
Yeah.  This comes up most often with left joins.  It's the source of
the #1 bug I see in SQL -- it trips up even the experts sometimes.

SELECT * FROM foo LEFT JOIN BAR ON foo.id = bar.id WHERE bar.col = 'something';

By having the filtering in the where clause, the intended purpose of
the left join, to return every row of foo, is being defeated and the
join will behave like an inner join.  The right way to do it is:

SELECT * FROM foo LEFT JOIN BAR ON foo.id = bar.id AND bar.col = 'something';

The difference here is that the filtering is now happening at join
time where the left join semantics are playing: always return foo and
return bar rows if and only if the join condition is met.

merlin


-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Respected Sir,

You example is obvious!


... to me, only after I had spent ten minutes looking at it!


Thanks,
Gavin