Обсуждение: Unnecessary repeat condition for a self inner join

Поиск
Список
Период
Сортировка

Unnecessary repeat condition for a self inner join

От
"Robins Tharakan"
Дата:
<span style="font-family: verdana,sans-serif;">Hi,</span><br style="font-family: verdana,sans-serif;" /><br
style="font-family:verdana,sans-serif;" /><span style="font-family: verdana,sans-serif;">I am not sure if this is a
simple(... stupid) question but I just wasted two hours optimizing a query, so I thought I should drop in to
ask.</span><brstyle="font-family: verdana,sans-serif;" /><br style="font-family: verdana,sans-serif;" /><span
style="font-family:verdana,sans-serif;">The only difference between query1 and query2 (below) is that despite an
explicitINNER JOIN, I have repeated the same condition for n2 (as given for n1) and this makes a whole lot of
differencein performance (since it now uses the same index for n2 that it is using for n1).</span><br
style="font-family:verdana,sans-serif;" /><br style="font-family: verdana,sans-serif;" /><span style="font-family:
verdana,sans-serif;">Incase of an INNER JOIN, shouldn't the second condition (in Query2) be unnecessary ? <br />Or am I
beingunreasonable in this expectation ?</span><br style="font-family: verdana,sans-serif;" /><br style="font-family:
verdana,sans-serif;"/><span style="font-family: verdana,sans-serif;">Regards,</span><br style="font-family:
verdana,sans-serif;"/><b style="font-family: verdana,sans-serif;">Robins Tharakan</b><br /><br /><span
style="font-family:verdana,sans-serif;">p.s.: The query below is just a simplification, and provides only EXPLAIN, but
Ithink an EXPLAIN ANALYSE should be unnecessary here. In case anyone still needs it, please do tell.</span><br /><br
/><bstyle="font-family: courier new,monospace;">Query 1</b><span style="font-family: courier
new,monospace;">:</span><brstyle="font-family: courier new,monospace;" /><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">SELECT n1.scheme_code</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">FROM nav n1</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">    INNER JOIN nav n2 ON n1.scheme_code =
n2.scheme_code</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">WHEREn1.scheme_code BETWEEN 100 AND 200</span><br style="font-family: courier new,monospace;" /><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">"Merge Join 
(cost=903471.23..10248343.37rows=622920912 width=4)"</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">"  Merge Cond: (n1.scheme_code = n2.scheme_code)"</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">"  ->  Sort 
(cost=110929.32..111458.60rows=211712 width=4)"</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">"        Sort Key: n1.scheme_code"</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">"        ->  Bitmap Heap Scan on nav n1 
(cost=8623.86..92201.54rows=211712 width=4)"</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">"              Recheck Cond: ((scheme_code >= 100) AND (scheme_code <=
200))"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">"             ->  Bitmap Index Scan on pk_fs_nav  (cost=0.00..8570.94 rows=211712
width=0)"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">"                   Index Cond: ((scheme_code >= 100) AND (scheme_code <= 200))"</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">"  ->  Sort 
(cost=792541.91..805391.17rows=5139702 width=4)"</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">"        Sort Key: n2.scheme_code"</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">"        ->  Seq Scan on nav n2 
(cost=0.00..131799.02rows=5139702 width=4)"</span><br style="font-family: courier new,monospace;" /><br
style="font-family:courier new,monospace;" /><br style="font-family: courier new,monospace;" /><b style="font-family:
couriernew,monospace;">Query 2</b><span style="font-family: courier new,monospace;">:</span><br style="font-family:
couriernew,monospace;" /><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">SELECTn1.scheme_code</span><br style="font-family: courier new,monospace;" /><span style="font-family:
couriernew,monospace;">FROM nav n1</span><br style="font-family: courier new,monospace;" /><span style="font-family:
couriernew,monospace;">    INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">WHERE n1.scheme_code BETWEEN 100 AND 200</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">    AND n2.scheme_code
BETWEEN100 AND 200</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">   </span><br style="font-family: courier new,monospace;" /><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">"Merge Join  (cost=221858.63..607790.72
rows=25659043width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">" Merge Cond: (n2.scheme_code = n1.scheme_code)"</span><br style="font-family: courier new,monospace;"
/><spanstyle="font-family: courier new,monospace;">"  ->  Sort  (cost=110929.32..111458.60 rows=211712
width=4)"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">"       Sort Key: n2.scheme_code"</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">"        ->  Bitmap Heap Scan on nav n2  (cost=8623.86..92201.54
rows=211712width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">"             Recheck Cond: ((scheme_code >= 100) AND (scheme_code <= 200))"</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">"              -> 
BitmapIndex Scan on pk_fs_nav  (cost=0.00..8570.94 rows=211712 width=0)"</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">"                    Index Cond: ((scheme_code >=
100)AND (scheme_code <= 200))"</span><br style="font-family: courier new,monospace;" /><span style="font-family:
couriernew,monospace;">"  ->  Sort  (cost=110929.32..111458.60 rows=211712 width=4)"</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">"        Sort Key: n1.scheme_code"</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">"        ->  Bitmap
HeapScan on nav n1  (cost=8623.86..92201.54 rows=211712 width=4)"</span><br style="font-family: courier new,monospace;"
/><spanstyle="font-family: courier new,monospace;">"              Recheck Cond: ((scheme_code >= 100) AND
(scheme_code<= 200))"</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">"             ->  Bitmap Index Scan on pk_fs_nav  (cost=0.00..8570.94 rows=211712
width=0)"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">"                   Index Cond: ((scheme_code >= 100) AND (scheme_code <= 200))"</span><br
style="font-family:courier new,monospace;" /><br style="font-family: courier new,monospace;" /><br /> 

Re: Unnecessary repeat condition for a self inner join

От
Tom Lane
Дата:
"Robins Tharakan" <tharakan@gmail.com> writes:
> In case of an INNER JOIN, shouldn't the second condition (in Query2) be
> unnecessary ?
> Or am I being unreasonable in this expectation ?

> SELECT n1.scheme_code
> FROM nav n1
>     INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code
> WHERE n1.scheme_code BETWEEN 100 AND 200
>     AND n2.scheme_code BETWEEN 100 AND 200

While the optimizer theoretically could deduce the extra restriction
condition, it doesn't attempt to.  It's extremely unclear that the extra
cycles to look for such cases would be repaid on average, because cases
like this aren't that common.  The current state of affairs is that
the system will deduce implied equality conditions, but not implied
inequality conditions.

[ thinks for a bit... ]  The current policy has been driven in part
by the assumption that looking for cases where such a deduction
could apply would be pretty expensive.  I wonder though whether the
recent EquivalenceClass work has changed the landscape.  We now store
an explicit representation of the btree opclasses associated with
each equivalence condition, which is one of the pieces that would be
needed to match up the equivalences with inequality conditions.
I'm still dubious, but that's at least one less catalog search ...
        regards, tom lane


Re: Unnecessary repeat condition for a self inner join

От
"Robins Tharakan"
Дата:

While the optimizer theoretically could deduce the extra restriction
condition, it doesn't attempt to.  It's extremely unclear that the extra
cycles to look for such cases would be repaid on average, because cases
like this aren't that common.  The current state of affairs is that
the system will deduce implied equality conditions, but not implied
inequality conditions.

One good thing is that the equality conditions are taken care of. But I fail to understand why do you believe that the second case is rare. I think the optimizer would (in all self-join inequality conditions) tend towards a table scan, which for a large table is a disaster. (Of course, the index scan would help only if the result-set is small)

Besides, I did a simple test and although you are right about the optimizer deducing implied equality conditions, this holds true only for a direct join. In the second query, the optimizer recommends a table scan even for a simple IN() condition.

Is that normal ?


Regards,
Robins Tharakan

Query 1:

SELECT n1.scheme_code
FROM nav n1
    INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code
WHERE n1.scheme_code = 290

"Nested Loop  (cost=0.00..16147232.47 rows=4796100 width=4)"
"  ->  Index Scan using nav__schemecode_date_lookup3b on nav n1  (cost=0.00..7347.91 rows=2190 width=4)"
"        Index Cond: (scheme_code = 290)"
"  ->  Index Scan using nav__schemecode_date_lookup3b on nav n2  (cost=0.00..7347.91 rows=2190 width=4)"
"        Index Cond: (290 = scheme_code)"


Query 2:

SELECT n1.scheme_code
FROM nav n1
    INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code
WHERE n1.scheme_code IN (1, 2)

"Hash Join  (cost=206004.00..431864.83 rows=10720451 width=4)"
"  Hash Cond: (n1.scheme_code = n2.scheme_code)"
"  ->  Bitmap Heap Scan on nav n1  (cost=139.62..13663.13 rows=4378 width=4)"
"        Recheck Cond: (scheme_code = ANY ('{1,2}'::integer[]))"
"        ->  Bitmap Index Scan on nav__schemecode_date_lookup3b  (cost=0.00..138.53 rows=4378 width=0)"
"              Index Cond: (scheme_code = ANY ('{1,2}'::integer[]))"
"  ->  Hash  (cost=112078.06..112078.06 rows=5395306 width=4)"
"        ->  Seq Scan on nav n2  (cost=0.00..112078.06 rows=5395306 width=4)"


Re: Unnecessary repeat condition for a self inner join

От
Tom Lane
Дата:
"Robins Tharakan" <tharakan@gmail.com> writes:
> Besides, I did a simple test and although you are right about the optimizer
> deducing implied equality conditions, this holds true only for a direct
> join. In the second query, the optimizer recommends a table scan even for a
> simple IN() condition.

An IN is not an equivalence condition.
        regards, tom lane