Обсуждение: Re: Postgres using nested loops despite setting enable_nestloop to false

Поиск
Список
Период
Сортировка

Re: Postgres using nested loops despite setting enable_nestloop to false

От
Justin Pryzby
Дата:
On Tue, Nov 17, 2020 at 02:47:55PM +0100, Frits Jalvingh wrote:
> But lately while migrating to Postgres 13 (from 9.6) we found that Postgres
> does not (always) obey the enable_nestloop = false setting anymore: some
> 
> The execution plan on Postgres 13.1:

Could you send the plans under pg13 and pg9.6 as attachments ?

What is the setting of work_mem ?

I see the cost is dominated by 2*disable_cost, but I wonder whether the I/O
cost of hash joins now exceeds that.  Maybe hash_mem_multiplier helps you?

GroupAggregate  (cost=20008853763.07..20008853776.02 rows=370 width=68)
                                                                                                          
 
  Group Key: (COALESCE(adres_pe.id_s, '-1'::integer)),
                                                                                                          
 

-- 
Justin



Re: Postgres using nested loops despite setting enable_nestloop to false

От
Frits Jalvingh
Дата:
Hi Justin, thanks for your help!
I have attached both plans, both made with set enable_nestloop = false in the attachments.
On the Postgresql 13 server work_mem is 64MB. It cannot really be higher there because Postgresql does not control its use of memory, setting it higher on this VM will cause the OOM killer to kill Postgresql for some queries.
On the Postgres 9.6 server we have it way higher, at 5GB (this machine is a monster with about 800GB of RAM).

I indeed saw too that the artificial cost for the nested join led to 2x that amount. But that seems to be because there are actually 2 nested joins in there: we use a cross join with a "time" table (which contains just some 28 rows) and that one always seems to need a nested loop (it is present always). So I'm not too certain that that 2x disable_cost is from joins; it seems to be from 2x the nested loop. And I actually wondered whether that would be a cause of the issue, because as far as costs are concerned that second nested loops only _increases_ the cost by 2 times...

Regards,

Frits


On Tue, Nov 17, 2020 at 3:20 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Nov 17, 2020 at 02:47:55PM +0100, Frits Jalvingh wrote:
> But lately while migrating to Postgres 13 (from 9.6) we found that Postgres
> does not (always) obey the enable_nestloop = false setting anymore: some
>
> The execution plan on Postgres 13.1:

Could you send the plans under pg13 and pg9.6 as attachments ?

What is the setting of work_mem ?

I see the cost is dominated by 2*disable_cost, but I wonder whether the I/O
cost of hash joins now exceeds that.  Maybe hash_mem_multiplier helps you?

GroupAggregate  (cost=20008853763.07..20008853776.02 rows=370 width=68)                                                                                                                                                           
  Group Key: (COALESCE(adres_pe.id_s, '-1'::integer)),                                                                                                                                                                             

--
Justin


Вложения

Re: Postgres using nested loops despite setting enable_nestloop to false

От
Justin Pryzby
Дата:
On Tue, Nov 17, 2020 at 04:58:45PM +0100, Frits Jalvingh wrote:
> Hi Justin, thanks for your help!
> I have attached both plans, both made with set enable_nestloop = false in
> the attachments.
> On the Postgresql 13 server work_mem is 64MB. It cannot really be higher
> there because Postgresql does not control its use of memory, setting it
> higher on this VM will cause the OOM killer to kill Postgresql for some
> queries.

Can you try to get an explain just for this query with either increased
work_mem or hash_mem_multiplier ?

Or possibly by messing with the cost parameters, including seq_page_cost.
Making all cost_* params 1000x smaller might allow the disable cost to be
effective.

-- 
Justin



Re: Postgres using nested loops despite setting enable_nestloop to false

От
Frits Jalvingh
Дата:
Ah, sorry, I forgot. I set "hash_mem_multiplier = 2", and after that to 20. It did had no effects on the nested loops.

On Tue, Nov 17, 2020 at 4:58 PM Frits Jalvingh <jal@etc.to> wrote:
Hi Justin, thanks for your help!
I have attached both plans, both made with set enable_nestloop = false in the attachments.
On the Postgresql 13 server work_mem is 64MB. It cannot really be higher there because Postgresql does not control its use of memory, setting it higher on this VM will cause the OOM killer to kill Postgresql for some queries.
On the Postgres 9.6 server we have it way higher, at 5GB (this machine is a monster with about 800GB of RAM).

I indeed saw too that the artificial cost for the nested join led to 2x that amount. But that seems to be because there are actually 2 nested joins in there: we use a cross join with a "time" table (which contains just some 28 rows) and that one always seems to need a nested loop (it is present always). So I'm not too certain that that 2x disable_cost is from joins; it seems to be from 2x the nested loop. And I actually wondered whether that would be a cause of the issue, because as far as costs are concerned that second nested loops only _increases_ the cost by 2 times...

Regards,

Frits


On Tue, Nov 17, 2020 at 3:20 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Nov 17, 2020 at 02:47:55PM +0100, Frits Jalvingh wrote:
> But lately while migrating to Postgres 13 (from 9.6) we found that Postgres
> does not (always) obey the enable_nestloop = false setting anymore: some
>
> The execution plan on Postgres 13.1:

Could you send the plans under pg13 and pg9.6 as attachments ?

What is the setting of work_mem ?

I see the cost is dominated by 2*disable_cost, but I wonder whether the I/O
cost of hash joins now exceeds that.  Maybe hash_mem_multiplier helps you?

GroupAggregate  (cost=20008853763.07..20008853776.02 rows=370 width=68)                                                                                                                                                           
  Group Key: (COALESCE(adres_pe.id_s, '-1'::integer)),                                                                                                                                                                             

--
Justin


Re: Postgres using nested loops despite setting enable_nestloop to false

От
Frits Jalvingh
Дата:
Ok, I set all those cost parameters:
# - Planner Cost Constants -

seq_page_cost = 0.0001                  # measured on an arbitrary scale
random_page_cost = 0.0002
cpu_tuple_cost = 0.00001                        # same scale as above
cpu_index_tuple_cost = 0.000005         # same scale as above
cpu_operator_cost = 0.0000025           # same scale as above
parallel_tuple_cost = 0.0001            # same scale as above
parallel_setup_cost = 1.0       # same scale as above
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
effective_cache_size = 2GB

It still has the nested loop on top, but the total cost is now:
GroupAggregate  (cost=20000005652.88..20000005652.90 rows=370 width=68)


On Tue, Nov 17, 2020 at 5:08 PM Frits Jalvingh <jal@etc.to> wrote:
Ah, sorry, I forgot. I set "hash_mem_multiplier = 2", and after that to 20. It did had no effects on the nested loops.

On Tue, Nov 17, 2020 at 4:58 PM Frits Jalvingh <jal@etc.to> wrote:
Hi Justin, thanks for your help!
I have attached both plans, both made with set enable_nestloop = false in the attachments.
On the Postgresql 13 server work_mem is 64MB. It cannot really be higher there because Postgresql does not control its use of memory, setting it higher on this VM will cause the OOM killer to kill Postgresql for some queries.
On the Postgres 9.6 server we have it way higher, at 5GB (this machine is a monster with about 800GB of RAM).

I indeed saw too that the artificial cost for the nested join led to 2x that amount. But that seems to be because there are actually 2 nested joins in there: we use a cross join with a "time" table (which contains just some 28 rows) and that one always seems to need a nested loop (it is present always). So I'm not too certain that that 2x disable_cost is from joins; it seems to be from 2x the nested loop. And I actually wondered whether that would be a cause of the issue, because as far as costs are concerned that second nested loops only _increases_ the cost by 2 times...

Regards,

Frits


On Tue, Nov 17, 2020 at 3:20 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Nov 17, 2020 at 02:47:55PM +0100, Frits Jalvingh wrote:
> But lately while migrating to Postgres 13 (from 9.6) we found that Postgres
> does not (always) obey the enable_nestloop = false setting anymore: some
>
> The execution plan on Postgres 13.1:

Could you send the plans under pg13 and pg9.6 as attachments ?

What is the setting of work_mem ?

I see the cost is dominated by 2*disable_cost, but I wonder whether the I/O
cost of hash joins now exceeds that.  Maybe hash_mem_multiplier helps you?

GroupAggregate  (cost=20008853763.07..20008853776.02 rows=370 width=68)                                                                                                                                                           
  Group Key: (COALESCE(adres_pe.id_s, '-1'::integer)),                                                                                                                                                                             

--
Justin


Re: Postgres using nested loops despite setting enable_nestloop to false

От
Tom Lane
Дата:
Frits Jalvingh <jal@etc.to> writes:
> I have attached both plans, both made with set enable_nestloop = false in
> the attachments.

The reason why you're getting a nested loop is that the planner has no
other choice.  The "tijd" table has no join conditions that would be
amenable to hash- or merge-joining it to something else, because both
of those join methods require a plain equality join condition.  AFAICS
in a quick look, all of tijd's join conditions look more like

    Join Filter: ((di04238cluster.dv_start_dts <= tijd.einddatum) AND (di04238cluster.dv_end_dts > tijd.einddatum))

which is not amenable to anything except brute force cross-join-and-
test-the-condition.

Given that, it's likely that "enable_nestloop = false" is making things
worse not better, by artificially distorting the plan shape.

Seeing the large number of joins involved, I wonder what your
geqo_threshold, join_collapse_limit, and from_collapse_limit settings
are, and whether you can get a better plan by increasing them.

The planner doesn't seem to think that any of these joins involve
a very large number of rows, so I doubt that your work_mem setting
is very relevant.  However, are these rowcount estimates accurate?
You claimed upthread that you were dealing with hundreds of millions
of rows, but it's impossible to credit that cost estimates like

  ->  Seq Scan on s_h_cluster_ssm di01905cluster  (cost=0.00..155.05 rows=1072 width=24)
        Filter: (soort = 'FIN'::text)

correspond to scanning large tables.

In the end, I fear that finding a way to get rid of those
inequality join conditions may be your only real answer.

            regards, tom lane



Re: Postgres using nested loops despite setting enable_nestloop to false

От
Frits Jalvingh
Дата:
Hello Tom, thanks for your help!

I understand that the "time" table cross join needs a nested loop. Indeed that nested loop is present in all plans generated.
But it is the _second_ (topmost) nested loop that is the issue. Once the time table has been joined it should be possible to do something else for that second nested loop. This is proven by that query on 9.6 (which has only one nested loop for that exact same query, on almost the same database content as the Postgresql 13 one). Even on Postgresql 13 a correct plan is made in another database (exact same structure, different data); I have attached the plan that is made there too.
All databases that make a plan without the second nested loops also finish the query within a reasonable time period (16 seconds on the .9.6 server). On the 13 server with the nested loops plan the process times out after 2 hours.

As far as the row counts go: yes, this database is not by far the biggest one, so the row counts are less. It also depends on what query we actually run (we can have hundreds of them on different tables, and not all tables are that big).

I disabled nested_loops not just for fun, I disabled it because without it many of the queries effectively hang because their plan estimate expects only a few rows while in reality there are millions. Disabling nested loops will let lots of the generated queries fail, even on smaller datasets.

I have no idea of how to get rid of those inequality queries, except by not using SQL and doing them by hand in code.. That would prove to be disastrous for performance as I'd have to read all those datasets completely... Do you have an idea on how to do that better?

Regards,
Frits


On Tue, Nov 17, 2020 at 5:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Frits Jalvingh <jal@etc.to> writes:
> I have attached both plans, both made with set enable_nestloop = false in
> the attachments.

The reason why you're getting a nested loop is that the planner has no
other choice.  The "tijd" table has no join conditions that would be
amenable to hash- or merge-joining it to something else, because both
of those join methods require a plain equality join condition.  AFAICS
in a quick look, all of tijd's join conditions look more like

    Join Filter: ((di04238cluster.dv_start_dts <= tijd.einddatum) AND (di04238cluster.dv_end_dts > tijd.einddatum))

which is not amenable to anything except brute force cross-join-and-
test-the-condition.

Given that, it's likely that "enable_nestloop = false" is making things
worse not better, by artificially distorting the plan shape.

Seeing the large number of joins involved, I wonder what your
geqo_threshold, join_collapse_limit, and from_collapse_limit settings
are, and whether you can get a better plan by increasing them.

The planner doesn't seem to think that any of these joins involve
a very large number of rows, so I doubt that your work_mem setting
is very relevant.  However, are these rowcount estimates accurate?
You claimed upthread that you were dealing with hundreds of millions
of rows, but it's impossible to credit that cost estimates like

  ->  Seq Scan on s_h_cluster_ssm di01905cluster  (cost=0.00..155.05 rows=1072 width=24)
        Filter: (soort = 'FIN'::text)

correspond to scanning large tables.

In the end, I fear that finding a way to get rid of those
inequality join conditions may be your only real answer.

                        regards, tom lane
Вложения

Re: Postgres using nested loops despite setting enable_nestloop to false

От
Frits Jalvingh
Дата:
I found out that setting:
set join_collapse_limit = 14;
set from_collapse_limit = 14;
In addition to disabling the nested loops does produce a viable plan, with only the nested loop to generate the tijd table cross join as a basic part down low... The original values for those were 12. It does seem scary to update those as the possibility of having 14! plans to choose from seems... scary...

It does feel a bit like throwing dice...

I assume the bad plan is being made by the gequ planner. Is there a way to discourage it from using those nested loops?

Regards,

Frits


On Tue, Nov 17, 2020 at 5:42 PM Frits Jalvingh <jal@etc.to> wrote:
Hello Tom, thanks for your help!

I understand that the "time" table cross join needs a nested loop. Indeed that nested loop is present in all plans generated.
But it is the _second_ (topmost) nested loop that is the issue. Once the time table has been joined it should be possible to do something else for that second nested loop. This is proven by that query on 9.6 (which has only one nested loop for that exact same query, on almost the same database content as the Postgresql 13 one). Even on Postgresql 13 a correct plan is made in another database (exact same structure, different data); I have attached the plan that is made there too.
All databases that make a plan without the second nested loops also finish the query within a reasonable time period (16 seconds on the .9.6 server). On the 13 server with the nested loops plan the process times out after 2 hours.

As far as the row counts go: yes, this database is not by far the biggest one, so the row counts are less. It also depends on what query we actually run (we can have hundreds of them on different tables, and not all tables are that big).

I disabled nested_loops not just for fun, I disabled it because without it many of the queries effectively hang because their plan estimate expects only a few rows while in reality there are millions. Disabling nested loops will let lots of the generated queries fail, even on smaller datasets.

I have no idea of how to get rid of those inequality queries, except by not using SQL and doing them by hand in code.. That would prove to be disastrous for performance as I'd have to read all those datasets completely... Do you have an idea on how to do that better?

Regards,
Frits


On Tue, Nov 17, 2020 at 5:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Frits Jalvingh <jal@etc.to> writes:
> I have attached both plans, both made with set enable_nestloop = false in
> the attachments.

The reason why you're getting a nested loop is that the planner has no
other choice.  The "tijd" table has no join conditions that would be
amenable to hash- or merge-joining it to something else, because both
of those join methods require a plain equality join condition.  AFAICS
in a quick look, all of tijd's join conditions look more like

    Join Filter: ((di04238cluster.dv_start_dts <= tijd.einddatum) AND (di04238cluster.dv_end_dts > tijd.einddatum))

which is not amenable to anything except brute force cross-join-and-
test-the-condition.

Given that, it's likely that "enable_nestloop = false" is making things
worse not better, by artificially distorting the plan shape.

Seeing the large number of joins involved, I wonder what your
geqo_threshold, join_collapse_limit, and from_collapse_limit settings
are, and whether you can get a better plan by increasing them.

The planner doesn't seem to think that any of these joins involve
a very large number of rows, so I doubt that your work_mem setting
is very relevant.  However, are these rowcount estimates accurate?
You claimed upthread that you were dealing with hundreds of millions
of rows, but it's impossible to credit that cost estimates like

  ->  Seq Scan on s_h_cluster_ssm di01905cluster  (cost=0.00..155.05 rows=1072 width=24)
        Filter: (soort = 'FIN'::text)

correspond to scanning large tables.

In the end, I fear that finding a way to get rid of those
inequality join conditions may be your only real answer.

                        regards, tom lane