Обсуждение: [MASSMAIL]allow changing autovacuum_max_workers without restarting

Поиск
Список
Период
Сортировка

[MASSMAIL]allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
I frequently hear about scenarios where users with thousands upon thousands
of tables realize that autovacuum is struggling to keep up.  When they
inevitably go to bump up autovacuum_max_workers, they discover that it
requires a server restart (i.e., downtime) to take effect, causing further
frustration.  For this reason, I think $SUBJECT is a desirable improvement.
I spent some time looking for past discussions about this, and I was
surprised to not find any, so I thought I'd give it a try.

The attached proof-of-concept patch demonstrates what I have in mind.
Instead of trying to dynamically change the global process table, etc., I'm
proposing that we introduce a new GUC that sets the effective maximum
number of autovacuum workers that can be started at any time.  This means
there would be two GUCs for the number of autovacuum workers: one for the
number of slots reserved for autovacuum workers, and another that restricts
the number of those slots that can be used.  The former would continue to
require a restart to change its value, and users would typically want to
set it relatively high.  The latter could be changed at any time and would
allow for raising or lowering the maximum number of active autovacuum
workers, up to the limit set by the other parameter.

The proof-of-concept patch keeps autovacuum_max_workers as the maximum
number of slots to reserve for workers, but I think we should instead
rename this parameter to something else and then reintroduce
autovacuum_max_workers as the new parameter that can be adjusted without
restarting.  That way, autovacuum_max_workers continues to work much the
same way as in previous versions.

There are a couple of weird cases with this approach.  One is when the
restart-only limit is set lower than the PGC_SIGHUP limit.  In that case, I
think we should just use the restart-only limit.  The other is when there
are already N active autovacuum workers and the PGC_SIGHUP parameter is
changed to something less than N.  For that case, I think we should just
block starting additional workers until the number of workers drops below
the new parameter's value.  I don't think we should kill existing workers,
or anything else like that.

TBH I've been sitting on this idea for a while now, only because I think it
has a slim chance of acceptance, but IMHO this is a simple change that
could help many users.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Вложения

Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
> I frequently hear about scenarios where users with thousands upon thousands
> of tables realize that autovacuum is struggling to keep up.  When they
> inevitably go to bump up autovacuum_max_workers, they discover that it
> requires a server restart (i.e., downtime) to take effect, causing further
> frustration.  For this reason, I think $SUBJECT is a desirable improvement.
> I spent some time looking for past discussions about this, and I was
> surprised to not find any, so I thought I'd give it a try.

I did not review the patch in detail yet, but +1 to the idea. 
It's not just thousands of tables that suffer from this.
If a user has a few large tables hogging the autovac workers, then other
tables don't get the autovac cycles they require. Users are then forced
to run manual vacuums, which adds complexity to their operations.

> The attached proof-of-concept patch demonstrates what I have in mind.
> Instead of trying to dynamically change the global process table, etc., I'm
> proposing that we introduce a new GUC that sets the effective maximum
> number of autovacuum workers that can be started at any time.

max_worker_processes defines a pool of max # of background workers allowed.
parallel workers and extensions that spin  up background workers all utilize from 
this pool. 

Should autovacuum_max_workers be able to utilize from max_worker_processes also?

This will allow autovacuum_max_workers to be dynamic while the user only has
to deal with an already existing GUC. We may want to increase the default value
for max_worker_processes as part of this.


Regards,

Sami
Amazon Web Services (AWS)


Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Thu, Apr 11, 2024 at 02:24:18PM +0000, Imseih (AWS), Sami wrote:
> max_worker_processes defines a pool of max # of background workers allowed.
> parallel workers and extensions that spin  up background workers all utilize from 
> this pool. 
> 
> Should autovacuum_max_workers be able to utilize from max_worker_processes also?
> 
> This will allow autovacuum_max_workers to be dynamic while the user only has
> to deal with an already existing GUC. We may want to increase the default value
> for max_worker_processes as part of this.

My concern with this approach is that other background workers could use up
all the slots and prevent autovacuum workers from starting, unless of
course we reserve autovacuum_max_workers slots for _only_ autovacuum
workers.  I'm not sure if we want to get these parameters tangled up like
this, though...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Thu, Apr 11, 2024 at 09:42:40AM -0500, Nathan Bossart wrote:
> On Thu, Apr 11, 2024 at 02:24:18PM +0000, Imseih (AWS), Sami wrote:
>> max_worker_processes defines a pool of max # of background workers allowed.
>> parallel workers and extensions that spin  up background workers all utilize from 
>> this pool. 
>> 
>> Should autovacuum_max_workers be able to utilize from max_worker_processes also?
>> 
>> This will allow autovacuum_max_workers to be dynamic while the user only has
>> to deal with an already existing GUC. We may want to increase the default value
>> for max_worker_processes as part of this.
> 
> My concern with this approach is that other background workers could use up
> all the slots and prevent autovacuum workers from starting, unless of
> course we reserve autovacuum_max_workers slots for _only_ autovacuum
> workers.  I'm not sure if we want to get these parameters tangled up like
> this, though...

I see that the logical replication launcher process uses this pool, but we
take special care to make sure it gets a slot:

    /*
     * Register the apply launcher.  It's probably a good idea to call this
     * before any modules had a chance to take the background worker slots.
     */
    ApplyLauncherRegister();

I'm not sure there's another way to effectively reserve slots that would
work for the autovacuum workers (which need to restart to connect to
different databases), so that would need to be invented.  We'd probably
also want to fail startup if autovacuum_max_workers < max_worker_processes,
which seems like it has the potential to cause problems when folks first
upgrade to v18.

Furthermore, we might have to convert autovacuum workers to background
worker processes for this to work.  I've admittedly wondered about whether
we should do that eventually, anyway, but it'd expand the scope of this
work quite a bit.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
> My concern with this approach is that other background workers could use up
> all the slots and prevent autovacuum workers from starting

That's a good point, the current settings do not guarantee that you
get a worker for the purpose if none are available, 
i.e. max_parallel_workers_per_gather, you may have 2 workers planned 
and 0 launched. 

> unless of
> course we reserve autovacuum_max_workers slots for _only_ autovacuum
> workers. I'm not sure if we want to get these parameters tangled up like
> this, though...

This will be confusing to describe and we will be reserving autovac workers
implicitly, rather than explicitly with a new GUC.

Regards,

Sami  




Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Thu, Apr 11, 2024 at 03:37:23PM +0000, Imseih (AWS), Sami wrote:
>> My concern with this approach is that other background workers could use up
>> all the slots and prevent autovacuum workers from starting
> 
> That's a good point, the current settings do not guarantee that you
> get a worker for the purpose if none are available, 
> i.e. max_parallel_workers_per_gather, you may have 2 workers planned 
> and 0 launched. 
> 
>> unless of
>> course we reserve autovacuum_max_workers slots for _only_ autovacuum
>> workers. I'm not sure if we want to get these parameters tangled up like
>> this, though...
> 
> This will be confusing to describe and we will be reserving autovac workers
> implicitly, rather than explicitly with a new GUC.

Yeah, that's probably a good reason to give autovacuum its own worker pool.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
I spent sometime reviewing/testing the POC. It is relatively simple with a lot
of obvious value. 

I tested with 16 tables that constantly reach the autovac threashold and the
patch did the right thing. I observed concurrent autovacuum workers matching
the setting as I was adjusting it dynamically.

As you mention above, If there are more autovacs in progress and a new lower setting 
is applied, we should not take any special action on those autovacuums, and eventually 
the max number of autovacuum workers will match the setting.

I also tested by allowing user connections to reach max_connections, and observed the 
expected number of autovacuums spinning up and correctly adjusted.

Having autovacuum tests ( unless I missed something ) like the above is a good 
general improvement, but it should not be tied to this. 

A few comments on  the POC patch:

1/ We should emit a log when autovacuum_workers is set higher than the max.

2/ should the name of the restart limit be "reserved_autovacuum_workers"?

Regards,

Sami Imseih
AWS (Amazon Web Services)




Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Fri, Apr 12, 2024 at 05:27:40PM +0000, Imseih (AWS), Sami wrote:
> A few comments on  the POC patch:

Thanks for reviewing.

> 1/ We should emit a log when autovacuum_workers is set higher than the max.

Hm.  Maybe the autovacuum launcher could do that.

> 2/ should the name of the restart limit be "reserved_autovacuum_workers"?

That's kind-of what I had in mind, although I think we might want to avoid
the word "reserved" because it sounds a bit like reserved_connections and
superuser_reserved_connections.  "autovacuum_max_slots" or
"autovacuum_max_worker_slots" might be worth considering, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
>> 1/ We should emit a log when autovacuum_workers is set higher than the max.


>> Hm. Maybe the autovacuum launcher could do that.

Would it be better to use a GUC check_hook that compares the 
new value with the max allowed values and emits a WARNING ?

autovacuum_max_workers already has a check_autovacuum_max_workers
check_hook, which can be repurposed for this.

In the POC patch, this check_hook is kept as-is, which will no longer make sense.

>> 2/ should the name of the restart limit be "reserved_autovacuum_workers"?


>> That's kind-of what I had in mind, although I think we might want to avoid
>> the word "reserved" because it sounds a bit like reserved_connections 
>> and superuser_reserved_connections

Yes, I agree. This can be confusing.

>>  "autovacuum_max_slots" or
>> "autovacuum_max_worker_slots" might be worth considering, too.

"autovacuum_max_worker_slots" is probably the best option because
we should have "worker" in the name of the GUC.


Regards,

Sami


Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Fri, Apr 12, 2024 at 10:17:44PM +0000, Imseih (AWS), Sami wrote:
>>> Hm. Maybe the autovacuum launcher could do that.
> 
> Would it be better to use a GUC check_hook that compares the 
> new value with the max allowed values and emits a WARNING ?
> 
> autovacuum_max_workers already has a check_autovacuum_max_workers
> check_hook, which can be repurposed for this.
> 
> In the POC patch, this check_hook is kept as-is, which will no longer make sense.

IIRC using GUC hooks to handle dependencies like this is generally frowned
upon because it tends to not work very well [0].  We could probably get it
to work for this particular case, but IMHO we should still try to avoid
this approach.  I didn't find any similar warnings for other GUCs like
max_parallel_workers_per_gather, so it might not be crucial to emit a
WARNING here.

[0] https://postgr.es/m/27574.1581015893%40sss.pgh.pa.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
> IIRC using GUC hooks to handle dependencies like this is generally frowned
> upon because it tends to not work very well [0]. We could probably get it
> to work for this particular case, but IMHO we should still try to avoid
> this approach. 

Thanks for pointing this out. I agree, this could lead to false logs being
emitted.

> so it might not be crucial to emit a
> WARNING here.

As mentioned earlier in the thread, we can let the autovacuum launcher emit the 
log,  but it will need to be careful not flood the logs when this condition exists ( i.e.
log only the first time the condition is detected or log every once in a while )

The additional complexity is not worth it.

Regards,

Sami





Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
Here is a first attempt at a proper patch set based on the discussion thus
far.  I've split it up into several small patches for ease of review, which
is probably a bit excessive.  If this ever makes it to commit, they could
likely be combined.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Вложения

Re: allow changing autovacuum_max_workers without restarting

От
Justin Pryzby
Дата:
On Wed, Apr 10, 2024 at 04:23:44PM -0500, Nathan Bossart wrote:
> The attached proof-of-concept patch demonstrates what I have in mind.
> Instead of trying to dynamically change the global process table, etc., I'm
> proposing that we introduce a new GUC that sets the effective maximum
> number of autovacuum workers that can be started at any time.  This means
> there would be two GUCs for the number of autovacuum workers: one for the
> number of slots reserved for autovacuum workers, and another that restricts
> the number of those slots that can be used.  The former would continue to
> require a restart to change its value, and users would typically want to
> set it relatively high.  The latter could be changed at any time and would
> allow for raising or lowering the maximum number of active autovacuum
> workers, up to the limit set by the other parameter.
> 
> The proof-of-concept patch keeps autovacuum_max_workers as the maximum
> number of slots to reserve for workers, but I think we should instead
> rename this parameter to something else and then reintroduce
> autovacuum_max_workers as the new parameter that can be adjusted without
> restarting.  That way, autovacuum_max_workers continues to work much the
> same way as in previous versions.

When I thought about this, I considered proposing to add a new GUC for
"autovacuum_policy_workers".

autovacuum_max_workers would be the same as before, requiring a restart
to change.  The policy GUC would be the soft limit, changable at runtime
up to the hard limit of autovacuum_max_workers (or maybe any policy
value exceeding autovacuum_max_workers would be ignored).

We'd probably change autovacuum_max_workers to default to a higher value
(8, or 32 as in your patch), and have autovacuum_max_workers default to
3, for consistency with historic behavior.  Maybe
autovacuum_policy_workers=-1 would mean to use all workers.

There's the existing idea to change autovacuum thresholds during the
busy period of the day vs. off hours.  This would allow something
similar with nworkers rather than thresholds: if the goal were to reduce
the resource use of vacuum, the admin could set max_workers=8, with
policy_workers=2 during the busy period.

-- 
Justin



Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Mon, Apr 15, 2024 at 08:33:33AM -0500, Justin Pryzby wrote:
> On Wed, Apr 10, 2024 at 04:23:44PM -0500, Nathan Bossart wrote:
>> The proof-of-concept patch keeps autovacuum_max_workers as the maximum
>> number of slots to reserve for workers, but I think we should instead
>> rename this parameter to something else and then reintroduce
>> autovacuum_max_workers as the new parameter that can be adjusted without
>> restarting.  That way, autovacuum_max_workers continues to work much the
>> same way as in previous versions.
> 
> When I thought about this, I considered proposing to add a new GUC for
> "autovacuum_policy_workers".
> 
> autovacuum_max_workers would be the same as before, requiring a restart
> to change.  The policy GUC would be the soft limit, changable at runtime
> up to the hard limit of autovacuum_max_workers (or maybe any policy
> value exceeding autovacuum_max_workers would be ignored).
> 
> We'd probably change autovacuum_max_workers to default to a higher value
> (8, or 32 as in your patch), and have autovacuum_max_workers default to
> 3, for consistency with historic behavior.  Maybe
> autovacuum_policy_workers=-1 would mean to use all workers.

This sounds like roughly the same idea, although it is backwards from what
I'm proposing in the v1 patch set.  My thinking is that by making a new
restart-only GUC that would by default be set higher than the vast majority
of systems should ever need, we could simplify migrating to these
parameters.  The autovacuum_max_workers parameter would effectively retain
it's original meaning, and existing settings would continue to work
normally on v18, but users could now adjust it without restarting.  If we
did it the other way, users would need to bump up autovacuum_max_workers
and restart prior to being able to raise autovacuum_policy_workers beyond
what they previously had set for autovacuum_max_workers.  That being said,
I'm open to doing it this way if folks prefer this approach, as I think it
is still an improvement.

> There's the existing idea to change autovacuum thresholds during the
> busy period of the day vs. off hours.  This would allow something
> similar with nworkers rather than thresholds: if the goal were to reduce
> the resource use of vacuum, the admin could set max_workers=8, with
> policy_workers=2 during the busy period.

Precisely.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Mon, Apr 15, 2024 at 11:28:33AM -0500, Nathan Bossart wrote:
> On Mon, Apr 15, 2024 at 08:33:33AM -0500, Justin Pryzby wrote:
>> On Wed, Apr 10, 2024 at 04:23:44PM -0500, Nathan Bossart wrote:
>>> The proof-of-concept patch keeps autovacuum_max_workers as the maximum
>>> number of slots to reserve for workers, but I think we should instead
>>> rename this parameter to something else and then reintroduce
>>> autovacuum_max_workers as the new parameter that can be adjusted without
>>> restarting.  That way, autovacuum_max_workers continues to work much the
>>> same way as in previous versions.
>> 
>> When I thought about this, I considered proposing to add a new GUC for
>> "autovacuum_policy_workers".
>> 
>> autovacuum_max_workers would be the same as before, requiring a restart
>> to change.  The policy GUC would be the soft limit, changable at runtime
>> up to the hard limit of autovacuum_max_workers (or maybe any policy
>> value exceeding autovacuum_max_workers would be ignored).
>> 
>> We'd probably change autovacuum_max_workers to default to a higher value
>> (8, or 32 as in your patch), and have autovacuum_max_workers default to
>> 3, for consistency with historic behavior.  Maybe
>> autovacuum_policy_workers=-1 would mean to use all workers.
> 
> This sounds like roughly the same idea, although it is backwards from what
> I'm proposing in the v1 patch set.  My thinking is that by making a new
> restart-only GUC that would by default be set higher than the vast majority
> of systems should ever need, we could simplify migrating to these
> parameters.  The autovacuum_max_workers parameter would effectively retain
> it's original meaning, and existing settings would continue to work
> normally on v18, but users could now adjust it without restarting.  If we
> did it the other way, users would need to bump up autovacuum_max_workers
> and restart prior to being able to raise autovacuum_policy_workers beyond
> what they previously had set for autovacuum_max_workers.  That being said,
> I'm open to doing it this way if folks prefer this approach, as I think it
> is still an improvement.

Another option could be to just remove the restart-only GUC and hard-code
the upper limit of autovacuum_max_workers to 64 or 128 or something.  While
that would simplify matters, I suspect it would be hard to choose an
appropriate limit that won't quickly become outdated.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
> Another option could be to just remove the restart-only GUC and hard-code
> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
> that would simplify matters, I suspect it would be hard to choose an
> appropriate limit that won't quickly become outdated.

Hardcoded values are usually hard to deal with because they are hidden either
In code or in docs.

> When I thought about this, I considered proposing to add a new GUC for
> "autovacuum_policy_workers".

> autovacuum_max_workers would be the same as before, requiring a restart
> to change.  The policy GUC would be the soft limit, changable at runtime

I think autovacuum_max_workers should still be the GUC that controls
the number of concurrent autovacuums. This parameter is already well 
established and changing the meaning now will be confusing. 

I suspect most users will be glad it's now dynamic, but will probably 
be annoyed if it's no longer doing what it's supposed to.

Regards,

Sami 


Re: allow changing autovacuum_max_workers without restarting

От
wenhui qiu
Дата:
Agree +1,From a dba perspective, I would prefer that this parameter can be dynamically modified, rather than adding a new parameter,What is more difficult is how to smoothly reach the target value when the setting is considered to be too large and needs to be lowered.
    


Regards 

On Tue, 16 Apr 2024 at 01:41, Imseih (AWS), Sami <simseih@amazon.com> wrote:
> Another option could be to just remove the restart-only GUC and hard-code
> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
> that would simplify matters, I suspect it would be hard to choose an
> appropriate limit that won't quickly become outdated.

Hardcoded values are usually hard to deal with because they are hidden either
In code or in docs.

> When I thought about this, I considered proposing to add a new GUC for
> "autovacuum_policy_workers".

> autovacuum_max_workers would be the same as before, requiring a restart
> to change.  The policy GUC would be the soft limit, changable at runtime

I think autovacuum_max_workers should still be the GUC that controls
the number of concurrent autovacuums. This parameter is already well
established and changing the meaning now will be confusing.

I suspect most users will be glad it's now dynamic, but will probably
be annoyed if it's no longer doing what it's supposed to.

Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
> Here is a first attempt at a proper patch set based on the discussion thus
> far. I've split it up into several small patches for ease of review, which
> is probably a bit excessive. If this ever makes it to commit, they could
> likely be combined.

I looked at the patch set. With the help of DEBUG2 output, I tested to ensure
that the the autovacuum_cost_limit  balance adjusts correctly when the 
autovacuum_max_workers value increases/decreases. I did not think the 
patch will break this behavior, but it's important to verify this.

Some comments on the patch:

1. A nit. There should be a tab here.

-       dlist_head      av_freeWorkers;
+       dclist_head av_freeWorkers;

2. autovacuum_max_worker_slots documentation:

+       <para>
+        Note that the value of <xref linkend="guc-autovacuum-max-workers"/> is
+        silently capped to this value.
+       </para>

This comment looks redundant in the docs, since the entry 
for autovacuum_max_workers that follows mentions the
same.


3. The docs for autovacuum_max_workers should mention that when
the value changes, consider adjusting the autovacuum_cost_limit/cost_delay. 

This is not something new. Even in the current state, users should think about 
these settings. However, it seems even important if this value is to be 
dynamically adjusted.


Regards,

Sami







Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Thu, Apr 18, 2024 at 05:05:03AM +0000, Imseih (AWS), Sami wrote:
> I looked at the patch set. With the help of DEBUG2 output, I tested to ensure
> that the the autovacuum_cost_limit  balance adjusts correctly when the 
> autovacuum_max_workers value increases/decreases. I did not think the 
> patch will break this behavior, but it's important to verify this.

Great.

> 1. A nit. There should be a tab here.
> 
> -       dlist_head      av_freeWorkers;
> +       dclist_head av_freeWorkers;

I dare not argue with pgindent.

> 2. autovacuum_max_worker_slots documentation:
> 
> +       <para>
> +        Note that the value of <xref linkend="guc-autovacuum-max-workers"/> is
> +        silently capped to this value.
> +       </para>
> 
> This comment looks redundant in the docs, since the entry 
> for autovacuum_max_workers that follows mentions the
> same.

Removed in v2.  I also noticed that I forgot to update the part about when
autovacuum_max_workers can be changed.  *facepalm*

> 3. The docs for autovacuum_max_workers should mention that when
> the value changes, consider adjusting the autovacuum_cost_limit/cost_delay. 
> 
> This is not something new. Even in the current state, users should think about 
> these settings. However, it seems even important if this value is to be 
> dynamically adjusted.

I don't necessarily disagree that it might be worth mentioning these
parameters, but I would argue that this should be proposed in a separate
thread.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Вложения

Re: allow changing autovacuum_max_workers without restarting

От
Robert Haas
Дата:
On Fri, Apr 19, 2024 at 11:43 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> Removed in v2.  I also noticed that I forgot to update the part about when
> autovacuum_max_workers can be changed.  *facepalm*

I think this could help a bunch of users, but I'd still like to
complain, not so much with the desire to kill this patch as with the
desire to broaden the conversation.

Part of the underlying problem here is that, AFAIK, neither PostgreSQL
as a piece of software nor we as human beings who operate PostgreSQL
databases have much understanding of how autovacuum_max_workers should
be set. It's relatively easy to hose yourself by raising
autovacuum_max_workers to try to make things go faster, but produce
the exact opposite effect due to how the cost balancing stuff works.
But, even if you have the correct use case for autovacuum_max_workers,
something like a few large tables that take a long time to vacuum plus
a bunch of smaller ones that can't get starved just because the big
tables are in the midst of being processed, you might well ask
yourself why it's your job to figure out the correct number of
workers.

Now, before this patch, there is a fairly good reason for that, which
is that we need to reserve shared memory resources for each autovacuum
worker that might potentially run, and the system can't know how much
shared memory you'd like to reserve for that purpose. But if that were
the only problem, then this patch would probably just be proposing to
crank up the default value of that parameter rather than introducing a
second one. I bet Nathan isn't proposing that because his intuition is
that it will work out badly, and I think he's right. I bet that
cranking up the number of allowed workers will often result in running
more workers than we really should. One possible negative consequence
is that we'll end up with multiple processes fighting over the disk in
a situation where they should just take turns. I suspect there are
also ways that we can be harmed - in broadly similar fashion - by cost
balancing.

So I feel like what this proposal reveals is that we know that our
algorithm for ramping up the number of running workers doesn't really
work. And maybe that's just a consequence of the general problem that
we have no global information about how much vacuuming work there is
to be done at any given time, and therefore we cannot take any kind of
sensible guess about whether 1 more worker will help or hurt. Or,
maybe there's some way to do better than what we do today without a
big rewrite. I'm not sure. I don't think this patch should be burdened
with solving the general problem here. But I do think the general
problem is worth some discussion.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Fri, Apr 19, 2024 at 02:42:13PM -0400, Robert Haas wrote:
> I think this could help a bunch of users, but I'd still like to
> complain, not so much with the desire to kill this patch as with the
> desire to broaden the conversation.

I think I subconsciously hoped this would spark a bigger discussion...

> Now, before this patch, there is a fairly good reason for that, which
> is that we need to reserve shared memory resources for each autovacuum
> worker that might potentially run, and the system can't know how much
> shared memory you'd like to reserve for that purpose. But if that were
> the only problem, then this patch would probably just be proposing to
> crank up the default value of that parameter rather than introducing a
> second one. I bet Nathan isn't proposing that because his intuition is
> that it will work out badly, and I think he's right. I bet that
> cranking up the number of allowed workers will often result in running
> more workers than we really should. One possible negative consequence
> is that we'll end up with multiple processes fighting over the disk in
> a situation where they should just take turns. I suspect there are
> also ways that we can be harmed - in broadly similar fashion - by cost
> balancing.

Even if we were content to bump up the default value of
autovacuum_max_workers and tell folks to just mess with the cost settings,
there are still probably many cases where bumping up the number of workers
further would be necessary.  If you have a zillion tables, turning
cost-based vacuuming off completely may be insufficient to keep up, at
which point your options become limited.  It can be difficult to tell
whether you might end up in this situation over time as your workload
evolves.  In any case, it's not clear to me that bumping up the default
value of autovacuum_max_workers would do more good than harm.  I get the
idea that the default of 3 is sufficient for a lot of clusters, so there'd
really be little upside to changing it AFAICT.  (I guess this proves your
point about my intuition.)

> So I feel like what this proposal reveals is that we know that our
> algorithm for ramping up the number of running workers doesn't really
> work. And maybe that's just a consequence of the general problem that
> we have no global information about how much vacuuming work there is
> to be done at any given time, and therefore we cannot take any kind of
> sensible guess about whether 1 more worker will help or hurt. Or,
> maybe there's some way to do better than what we do today without a
> big rewrite. I'm not sure. I don't think this patch should be burdened
> with solving the general problem here. But I do think the general
> problem is worth some discussion.

I certainly don't want to hold up $SUBJECT for a larger rewrite of
autovacuum scheduling, but I also don't want to shy away from a larger
rewrite if it's an idea whose time has come.  I'm looking forward to
hearing your ideas in your pgconf.dev talk.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
> Part of the underlying problem here is that, AFAIK, neither PostgreSQL
> as a piece of software nor we as human beings who operate PostgreSQL
> databases have much understanding of how autovacuum_max_workers should
> be set. It's relatively easy to hose yourself by raising
> autovacuum_max_workers to try to make things go faster, but produce
> the exact opposite effect due to how the cost balancing stuff works.

Yeah, this patch will not fix this problem. Anyone who raises av_max_workers
should think about adjusting the av_cost_delay. This was discussed up the
thread [4] and even without this patch, I think it's necessary to add more
documentation on the relationship between workers and cost.


> So I feel like what this proposal reveals is that we know that our
> algorithm for ramping up the number of running workers doesn't really
> work. And maybe that's just a consequence of the general problem that
> we have no global information about how much vacuuming work there is
> to be done at any given time, and therefore we cannot take any kind of
> sensible guess about whether 1 more worker will help or hurt. Or,
> maybe there's some way to do better than what we do today without a
> big rewrite. I'm not sure. I don't think this patch should be burdened
> with solving the general problem here. But I do think the general
> problem is worth some discussion.

This patch is only solving the operational problem of adjusting 
autovacuum_max_workers,  and it does so without introducing complexity.

A proposal that will alleviate the users from the burden of having to think about
autovacuum_max_workers, cost_delay and cost_limit settings will be great.
This patch may be the basis for such dynamic  "auto-tuning" of autovacuum workers.


Regards,

Sami

[4] https://www.postgresql.org/message-id/20240419154322.GA3988554%40nathanxps13


Re: allow changing autovacuum_max_workers without restarting

От
Robert Haas
Дата:
On Fri, Apr 19, 2024 at 4:29 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I certainly don't want to hold up $SUBJECT for a larger rewrite of
> autovacuum scheduling, but I also don't want to shy away from a larger
> rewrite if it's an idea whose time has come.  I'm looking forward to
> hearing your ideas in your pgconf.dev talk.

Yeah, I suppose I was hoping you were going to tell me the all the
answers and thus make the talk a lot easier to write, but I guess life
isn't that simple. :-)

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Mon, Apr 15, 2024 at 05:41:04PM +0000, Imseih (AWS), Sami wrote:
>> Another option could be to just remove the restart-only GUC and hard-code
>> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
>> that would simplify matters, I suspect it would be hard to choose an
>> appropriate limit that won't quickly become outdated.
> 
> Hardcoded values are usually hard to deal with because they are hidden either
> In code or in docs.

That's true, but using a hard-coded limit means we no longer need to add a
new GUC.  Always allocating, say, 256 slots might require a few additional
kilobytes of shared memory, most of which will go unused, but that seems
unlikely to be a problem for the systems that will run Postgres v18.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
> That's true, but using a hard-coded limit means we no longer need to add a
> new GUC. Always allocating, say, 256 slots might require a few additional
> kilobytes of shared memory, most of which will go unused, but that seems
> unlikely to be a problem for the systems that will run Postgres v18.

I agree with this.


Regards,

Sami


Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Fri, May 03, 2024 at 12:57:18PM +0000, Imseih (AWS), Sami wrote:
>> That's true, but using a hard-coded limit means we no longer need to add a
>> new GUC. Always allocating, say, 256 slots might require a few additional
>> kilobytes of shared memory, most of which will go unused, but that seems
>> unlikely to be a problem for the systems that will run Postgres v18.
> 
> I agree with this.

Here's what this might look like.  I chose an upper limit of 1024, which
seems like it "ought to be enough for anybody," at least for now.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Вложения

Re: allow changing autovacuum_max_workers without restarting

От
"Imseih (AWS), Sami"
Дата:
>>> That's true, but using a hard-coded limit means we no longer need to add a
>>> new GUC. Always allocating, say, 256 slots might require a few additional
>>> kilobytes of shared memory, most of which will go unused, but that seems
>>> unlikely to be a problem for the systems that will run Postgres v18.
>>
>> I agree with this.


> Here's what this might look like. I chose an upper limit of 1024, which
> seems like it "ought to be enough for anybody," at least for now.

I thought 256 was a good enough limit. In practice, I doubt anyone will 
benefit from more than a few dozen autovacuum workers. 
I think 1024 is way too high to even allow.

Besides that the overall patch looks good to me, but I have
some comments on the documentation.

I don't think combining 1024 + 5 = 1029 is a good idea in docs.
Breaking down the allotment and using the name of the constant 
is much more clear.

I suggest 
" max_connections + max_wal_senders + max_worker_processes + AUTOVAC_MAX_WORKER_SLOTS + 5"

and in other places in the docs, we should mention the actual 
value of AUTOVAC_MAX_WORKER_SLOTS. Maybe in the 
below section?

Instead of:
-    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
+    (1024) and allowed background

do something like:
-    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
+   AUTOVAC_MAX_WORKER_SLOTS  (1024) and allowed background

Also,  replace the 1024 here with AUTOVAC_MAX_WORKER_SLOTS.

+    <varname>max_wal_senders</varname>,
+    plus <varname>max_worker_processes</varname>, plus 1024 for autovacuum
+    worker processes, plus one extra for each 16


Also, Not sure if I am mistaken here, but the "+ 5" in the existing docs
seems wrong.
 
If it refers to NUM_AUXILIARY_PROCS defined in 
include/storage/proc.h, it should a "6"

#define NUM_AUXILIARY_PROCS             6

This is not a consequence of this patch, and can be dealt with
In a separate thread if my understanding is correct.


Regards,

Sami 



Re: allow changing autovacuum_max_workers without restarting

От
Nathan Bossart
Дата:
On Thu, May 16, 2024 at 04:37:10PM +0000, Imseih (AWS), Sami wrote:
> I thought 256 was a good enough limit. In practice, I doubt anyone will 
> benefit from more than a few dozen autovacuum workers. 
> I think 1024 is way too high to even allow.

WFM

> I don't think combining 1024 + 5 = 1029 is a good idea in docs.
> Breaking down the allotment and using the name of the constant 
> is much more clear.
> 
> I suggest 
> " max_connections + max_wal_senders + max_worker_processes + AUTOVAC_MAX_WORKER_SLOTS + 5"
> 
> and in other places in the docs, we should mention the actual 
> value of AUTOVAC_MAX_WORKER_SLOTS. Maybe in the 
> below section?
> 
> Instead of:
> -    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
> +    (1024) and allowed background
> 
> do something like:
> -    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
> +   AUTOVAC_MAX_WORKER_SLOTS  (1024) and allowed background
> 
> Also,  replace the 1024 here with AUTOVAC_MAX_WORKER_SLOTS.
> 
> +    <varname>max_wal_senders</varname>,
> +    plus <varname>max_worker_processes</varname>, plus 1024 for autovacuum
> +    worker processes, plus one extra for each 16

Part of me wonders whether documenting the exact formula is worthwhile.
This portion of the docs is rather complicated, and I can't recall ever
having to do the arithmetic is describes.  Plus, see below...

> Also, Not sure if I am mistaken here, but the "+ 5" in the existing docs
> seems wrong.
>  
> If it refers to NUM_AUXILIARY_PROCS defined in 
> include/storage/proc.h, it should a "6"
> 
> #define NUM_AUXILIARY_PROCS             6
> 
> This is not a consequence of this patch, and can be dealt with
> In a separate thread if my understanding is correct.

Ha, I think it should actually be "+ 7"!  The value is calculated as

    MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders + 6

Looking at the history, this documentation tends to be wrong quite often.
In v9.2, the checkpointer was introduced, and these formulas were not
updated.  In v9.3, background worker processes were introduced, and the
formulas were still not updated.  Finally, in v9.6, it was fixed in commit
597f7e3.  Then, in v14, the archiver process was made an auxiliary process
(commit d75288f), making the formulas out-of-date again.  And in v17, the
WAL summarizer was added.

On top of this, IIUC you actually need even more semaphores if your system
doesn't support atomics, and from a quick skim this doesn't seem to be
covered in this documentation.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com