Re: [HACKERS] PoC: full merge join on comparison clause

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] PoC: full merge join on comparison clause
Дата
Msg-id CAFjFpRd-yz4MpFLZfgG6+PQKw-aH=9nj_RAtxJQuC=eg3Y3UXQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PoC: full merge join on comparison clause  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: [HACKERS] PoC: full merge join on comparison clause  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Список pgsql-hackers
On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
> I will continue reviewing the patches.
>

Here are some more review comments

- * sort ordering for each merge key.  The mergejoinable operator is an
- * equality operator in the opfamily, and the two inputs are guaranteed to be
+ * sort ordering for each merge key.  The mergejoinable operator is a
+ * comparison operator in the opfamily, and the two inputs are guaranteed to be

I think this prologue has to change substantially. At the beginning of the
prologue it explicitly mentions clauses like leftexpr = rightexpr. That
needs to be changed.

  * ordered in either increasing or decreasing (respectively) order according

It looks like the order of inputs is constrained by the in-equality operator.
That too needs to be specified here.

  * This allows us to obtain the needed comparison function from the opfamily.
@@ -200,6 +200,9 @@ MJExamineQuals(List *mergeclauses,
         Oid            op_righttype;
         Oid            sortfunc;

+        if (parent->mj_Ineq_Present)
+            elog(ERROR, "inequality mergejoin clause must be the last one");
+

IIUC, this never happens. If it really happens, we have created a path which
can not be used practically. That should never happen. It will help to add a
comment here clarifying that situation.

+    bool        have_inequality = have_inequality_mergeclause(mergeclauses);

There will be many paths created with different ordering of pathkeys. So,
instead of calling have_inequality_mergeclause() for each of those paths, it's
better to save this status in the path itself when creating the path.

         /* Make a pathkey list with this guy first */
         if (l != list_head(all_pathkeys))
+        {
+            if (have_inequality && l == list_tail(all_pathkeys))
+                /* Inequality merge clause must be the last, we can't
move it */
+                break;
+

I am kind of baffled by this change. IIUC the way we create different orderings
of pathkeys here, we are just rotating the pathkeys in circular order. This
means there is exactly one ordering of pathkeys where the pathkey corresponding
to the inequality clause is the last one. It's only that ordering which will be
retained and all other ordering will be discarded. Instead of that, I think we
should keep the pathkey corresponding to the inequality clause at the end (or
track in separately) and create different orderings of pathkeys by rotating
other pathkeys. This will allow us to cost the orderings as intended by this
fucntion.

     /* Might have no mergeclauses */
     if (nClauses == 0)
         return NIL;

+    {
+        List *ineq_clauses = find_inequality_clauses(mergeclauses);
+
+        if (list_length(ineq_clauses) > 1)
+            return NIL;

Without this patch, when there is an inequality clause with FULL JOIN, we will
not create a merge join path because select_mergejoin_clauses() will set
mergejoin_allowed to false. This means that we won't call
sort_inner_and_outer(). I think this patch also has to do the same i.e. when
there are more than one inequality clauses, select_mergejoin_clauses() should
set mergejoin_allowed to false in case of a FULL JOIN since merge join
machinary won't be able to handle that case.

If we do that, we could arrange extra.mergeclause_list such that the inequality
clause is always at the end thus finding inequality clause would be easy.

Again, this is not full review, but I am diving deeper into the
patch-set and understanding it better. Sorry.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Следующее
От: Jesper Pedersen
Дата:
Сообщение: Re: EXPLAIN of Parallel Append