Обсуждение: Which side of a Merge Join gets executed first? Do both sides always get executed?

Поиск
Список
Период
Сортировка

Which side of a Merge Join gets executed first? Do both sides always get executed?

От
Jerry Brenner
Дата:
The attached query plan is from 11.  
We are getting Merge Joins on both sides of the UNION.  In both cases, the first node under the Merge Join returns 0 rows but the other side of the Merge Join (the one being sorted) is executed and that's where all of the time is spent.

On the surface, I don't see any way from the attached explain plan to determine which side of the Merge Join is executed first.  Some questions:
  • Which side gets executed first?
  • How would one tell that from the json?
  • Have there been any relevant changes to later releases to make that more apparent?
  • Whichever side gets executed first, is the execution of the side that would be second get short circuited if 0 rows are returned by the first side?
Here's a screenshot from pgMustard.
  • Nodes 6 and 14 (the first node under each of the Merge Joins) each return 0 rows
  • Nodes 9 and 15 are the expensive sides of the Merge Joins and return lots of rows
image.png

NOTE:
  • The query plan in 13 is slightly different, but still includes the Merge Joins.
  • Replacing ANY(ARRAY(<subquery)) with IN(<subquery>) fixes the performance problem, but we'd still like to understand the execution characteristics of Merge Join
Thanks,
Jerry
Вложения

Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

От
Frédéric Yhuel
Дата:

Le 20/12/2023 à 15:40, Jerry Brenner a écrit :
> The attached query plan is from 11.
> We are getting Merge Joins on both sides of the UNION.  In both cases, 
> the first node under the Merge Join returns 0 rows but the other side of 
> the Merge Join (the one being sorted) is executed and that's where all 
> of the time is spent.
> 
> On the surface, I don't see any way from the attached explain plan to 
> determine which side of the Merge Join is executed first.  Some questions:
> 
>   * Which side gets executed first?
>   * How would one tell that from the json?
>   * Have there been any relevant changes to later releases to make that
>     more apparent?
>   * Whichever side gets executed first, is the execution of the side
>     that would be second get short circuited if 0 rows are returned by
>     the first side?
> 
> Here's a screenshot from pgMustard.
> 
>   * Nodes 6 and 14 (the first node under each of the Merge Joins) each
>     return 0 rows
>   * Nodes 9 and 15 are the expensive sides of the Merge Joins and return
>     lots of rows

I think those nodes (9 and 15) are expensive because they have to filter 
out 8 millions rows in order to produce their first output row. After 
that, they get short circuited.

Best regards,
Frédéric



Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

От
Frédéric Yhuel
Дата:

Le 20/12/2023 à 15:40, Jerry Brenner a écrit :
> Whichever side gets executed first, is the execution of the side that 
> would be second get short circuited if 0 rows are returned by the first 
> side?

Indeed, if 0 rows are returned from the outer relation, the scan of the 
inner relation is never executed.

Best regards,
Frédéric



Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

От
Jerry Brenner
Дата:
Thanks.  Does this make sense?
  • There are 3 nodes under the Merge Join
  • The first node is an InitPlan, due to the ANY(ARRAY()) - that gets executed and finds 0 matching rows
  • The second node is the outer node in the Merge Join and that is the expensive node in our query plan
  • The third node is the inner node in the Merge Join and that node references the SubPlan generated by the first node. The IndexCond has "id = ANY($2) AND ..." and the comparison with the result of the SubPlan does not find a match, so that's where the short-circuiting happens.
Here are the relevant lines from the node (12) accessing the result of the SubPlan:

                            "Plans": [
                              {
                                "Node Type": "Index Only Scan",
                                "Parent Relationship": "Outer",
                                "Parallel Aware": false,
                                "Scan Direction": "Forward",
                                "Index Name": "policyperi_u_id_1mw8mh83lyyd9",
                                "Relation Name": "pc_policyperiod",
                                "Alias": "qroots0",
                                "Startup Cost": 0.69,
                                "Total Cost": 18.15,
                                "Plan Rows": 10,
                                "Plan Width": 8,
                                "Actual Startup Time": 0.045,
                                "Actual Total Time": 0.045,
                                "Actual Rows": 0,
                                "Actual Loops": 1,
                                "Index Cond": "((id = ANY ($2)) AND (retired = 0) AND (temporarybranch = false))",

Here's the screenshot again:

image.png

Thanks,
Jerry

On Wed, Dec 20, 2023 at 10:32 AM Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:


Le 20/12/2023 à 15:40, Jerry Brenner a écrit :
> Whichever side gets executed first, is the execution of the side that
> would be second get short circuited if 0 rows are returned by the first
> side?

Indeed, if 0 rows are returned from the outer relation, the scan of the
inner relation is never executed.

Best regards,
Frédéric

Вложения

Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

От
Frédéric Yhuel
Дата:

Le 20/12/2023 à 20:04, Jerry Brenner a écrit :
> Thanks.  Does this make sense?
> 
>   * There are 3 nodes under the Merge Join
>   * The first node is an InitPlan, due to the ANY(ARRAY()) - that gets
>     executed and finds 0 matching rows
>   * The second node is the outer node in the Merge Join and that is the
>     expensive node in our query plan
>   * The third node is the inner node in the Merge Join and that node
>     references the SubPlan generated by the first node. The IndexCond
>     has*"id = ANY($2) AND ..."* and the comparison with the result of
>     the SubPlan does not find a match, so that's where the
>     short-circuiting happens.

I think it does.

I'm not very experienced with the customs of these mailing lists, but I 
think the following would help to get more answers :

* TEXT format of EXPLAIN is much more readable (compared to JSON)
* A well formatted query would help
* Screenshots aren't so great

Rather than a screenshot, maybe you could use one of explain.depesz.com, 
explain.dalibo.com, or explain-postgresql.com ?

Best regards,
Frédéric