Обсуждение: Thoughts about NUM_BUFFER_PARTITIONS

Поиск
Список
Период
Сортировка

Thoughts about NUM_BUFFER_PARTITIONS

От
wenhui qiu
Дата:
HI hackers
    When I read this text in this document there is a paragraph in it(https://www.interdb.jp/pg/pgsql08/03.html)
/*
The BufMappingLock is split into partitions to reduce contention in the buffer table (the default is 128 partitions). Each BufMappingLock partition guards a portion of the corresponding hash bucket slots.
*/,

Physical servers with terabytes of RAM are now commonplace,I'm looking at the comments inside the source code.I'm looking at the comments inside the source code to see if they still match the current hardware capability? The  comment says that in the future there may be a parameter,Iam a  dba now and I try to think of implementing this parameter, but I'm not a professional kernel developer, I still want the community senior developer to implement this parameter

/*
 * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
 * here, but we need them to figure out offsets within MainLWLockArray, and
 * having this file include lock.h or bufmgr.h would be backwards.
 */

/* Number of partitions of the shared buffer mapping hashtable */
#define NUM_BUFFER_PARTITIONS  128

/*
 * The number of partitions for locking purposes.  This is set to match
 * NUM_BUFFER_PARTITIONS for now, on the basis that whatever's good enough for
 * the buffer pool must be good enough for any other purpose.  This could
 * become a runtime parameter in future.
 */
#define DSHASH_NUM_PARTITIONS_LOG2 7
#define DSHASH_NUM_PARTITIONS (1 << DSHASH_NUM_PARTITIONS_LOG2)

Re: Thoughts about NUM_BUFFER_PARTITIONS

От
Heikki Linnakangas
Дата:
On 08/02/2024 12:17, wenhui qiu wrote:
> HI hackers
>      When I read this text in this document there is a paragraph in 
> it(https://www.interdb.jp/pg/pgsql08/03.html 
> <https://www.interdb.jp/pg/pgsql08/03.html>)
> /*
> The BufMappingLock is split into partitions to reduce contention in the 
> buffer table (the default is 128 partitions). Each BufMappingLock 
> partition guards a portion of the corresponding hash bucket slots.
> */,
> 
> Physical servers with terabytes of RAM are now commonplace,I'm looking 
> at the comments inside the source code.I'm looking at the comments 
> inside the source code to see if they still match the current hardware 
> capability?

The optimal number of partitions has more to do with the number of 
concurrent processes using the buffer cache, rather than the size of the 
cache. The number of CPUs in servers has increased too, but not as much 
as RAM.

But yeah, it's a valid question if the settings still make sense with 
modern hardware.

> The  comment says that in the future there may be a 
> parameter,Iam a  dba now and I try to think of implementing this 
> parameter, but I'm not a professional kernel developer, I still want the 
> community senior developer to implement this parameter

The first thing to do would be to benchmark with different 
NUM_BUFFER_PARTITIONS settings, and see if there's benefit in having 
more partitions.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Thoughts about NUM_BUFFER_PARTITIONS

От
wenhui qiu
Дата:
Hi Heikki Linnakangas 
    I think the larger shared buffer  higher the probability of multiple backend processes accessing the same bucket slot BufMappingLock simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I have free time, I want to do this test. I have seen some tests, but the result report is in Chinese


Best wishes

Heikki Linnakangas <hlinnaka@iki.fi> 于2024年2月8日周四 19:26写道:
On 08/02/2024 12:17, wenhui qiu wrote:
> HI hackers
>      When I read this text in this document there is a paragraph in
> it(https://www.interdb.jp/pg/pgsql08/03.html
> <https://www.interdb.jp/pg/pgsql08/03.html>)
> /*
> The BufMappingLock is split into partitions to reduce contention in the
> buffer table (the default is 128 partitions). Each BufMappingLock
> partition guards a portion of the corresponding hash bucket slots.
> */,
>
> Physical servers with terabytes of RAM are now commonplace,I'm looking
> at the comments inside the source code.I'm looking at the comments
> inside the source code to see if they still match the current hardware
> capability?

The optimal number of partitions has more to do with the number of
concurrent processes using the buffer cache, rather than the size of the
cache. The number of CPUs in servers has increased too, but not as much
as RAM.

But yeah, it's a valid question if the settings still make sense with
modern hardware.

> The  comment says that in the future there may be a
> parameter,Iam a  dba now and I try to think of implementing this
> parameter, but I'm not a professional kernel developer, I still want the
> community senior developer to implement this parameter

The first thing to do would be to benchmark with different
NUM_BUFFER_PARTITIONS settings, and see if there's benefit in having
more partitions.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Thoughts about NUM_BUFFER_PARTITIONS

От
Tomas Vondra
Дата:
On 2/8/24 14:27, wenhui qiu wrote:
> Hi Heikki Linnakangas
>     I think the larger shared buffer  higher the probability of multiple
> backend processes accessing the same bucket slot BufMappingLock
> simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
> have free time, I want to do this test. I have seen some tests, but the
> result report is in Chinese
> 

I think Heikki is right this is unrelated to the amount of RAM. The
partitions are meant to reduce the number of lock collisions when
multiple processes try to map a buffer concurrently. But the machines
got much larger in this regard too - in 2006 the common CPUs had maybe
2-4 cores, now it's common to have CPUs with ~100 cores, and systems
with multiple of them. OTOH the time spent holing the partition lock
should be pretty low, IIRC we pretty much just pin the buffer before
releasing it, and the backend should do plenty other expensive stuff. So
who knows how many backends end up doing the locking at the same time.

OTOH, with 128 partitions it takes just 14 backends to have 50% chance
of a conflict, so with enough cores ... But how many partitions would be
enough? With 1024 partitions it still takes only 38 backends to get 50%
chance of a collision. Better, but considering we now have hundreds of
cores, not sure if sufficient.

(Obviously, we probably want much lower probability of a collision, I
only used 50% to illustrate the changes).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Thoughts about NUM_BUFFER_PARTITIONS

От
Li Japin
Дата:

> On Feb 10, 2024, at 20:15, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> 
> On 2/8/24 14:27, wenhui qiu wrote:
>> Hi Heikki Linnakangas
>>    I think the larger shared buffer  higher the probability of multiple
>> backend processes accessing the same bucket slot BufMappingLock
>> simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
>> have free time, I want to do this test. I have seen some tests, but the
>> result report is in Chinese
>> 
> 
> I think Heikki is right this is unrelated to the amount of RAM. The
> partitions are meant to reduce the number of lock collisions when
> multiple processes try to map a buffer concurrently. But the machines
> got much larger in this regard too - in 2006 the common CPUs had maybe
> 2-4 cores, now it's common to have CPUs with ~100 cores, and systems
> with multiple of them. OTOH the time spent holing the partition lock
> should be pretty low, IIRC we pretty much just pin the buffer before
> releasing it, and the backend should do plenty other expensive stuff. So
> who knows how many backends end up doing the locking at the same time.
> 
> OTOH, with 128 partitions it takes just 14 backends to have 50% chance
> of a conflict, so with enough cores ... But how many partitions would be
> enough? With 1024 partitions it still takes only 38 backends to get 50%
> chance of a collision. Better, but considering we now have hundreds of
> cores, not sure if sufficient.
> 
> (Obviously, we probably want much lower probability of a collision, I
> only used 50% to illustrate the changes).
> 

I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
NUM_BUFFER_PARTITIONS, am I missing someghing?

Re: Thoughts about NUM_BUFFER_PARTITIONS

От
Tomas Vondra
Дата:
On 2/18/24 03:30, Li Japin wrote:
> 
> 
>> On Feb 10, 2024, at 20:15, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 2/8/24 14:27, wenhui qiu wrote:
>>> Hi Heikki Linnakangas
>>>    I think the larger shared buffer  higher the probability of multiple
>>> backend processes accessing the same bucket slot BufMappingLock
>>> simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
>>> have free time, I want to do this test. I have seen some tests, but the
>>> result report is in Chinese
>>>
>>
>> I think Heikki is right this is unrelated to the amount of RAM. The
>> partitions are meant to reduce the number of lock collisions when
>> multiple processes try to map a buffer concurrently. But the machines
>> got much larger in this regard too - in 2006 the common CPUs had maybe
>> 2-4 cores, now it's common to have CPUs with ~100 cores, and systems
>> with multiple of them. OTOH the time spent holing the partition lock
>> should be pretty low, IIRC we pretty much just pin the buffer before
>> releasing it, and the backend should do plenty other expensive stuff. So
>> who knows how many backends end up doing the locking at the same time.
>>
>> OTOH, with 128 partitions it takes just 14 backends to have 50% chance
>> of a conflict, so with enough cores ... But how many partitions would be
>> enough? With 1024 partitions it still takes only 38 backends to get 50%
>> chance of a collision. Better, but considering we now have hundreds of
>> cores, not sure if sufficient.
>>
>> (Obviously, we probably want much lower probability of a collision, I
>> only used 50% to illustrate the changes).
>>
> 
> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
> NUM_BUFFER_PARTITIONS, am I missing someghing?

IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
the partition locks if needed.

There's other places that acquire a bunch of locks, and all of them need
to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
GIST_MAX_SPLIT_PAGES.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Thoughts about NUM_BUFFER_PARTITIONS

От
Japin Li
Дата:
On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>

Thanks for the explanation!  Got it.

> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards



Re: Thoughts about NUM_BUFFER_PARTITIONS

От
wenhui qiu
Дата:
Hi Japlin Li
   Thank you for such important information ! Got it 

Japin Li <japinli@hotmail.com> 于2024年2月19日周一 10:26写道:

On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>

Thanks for the explanation!  Got it.

> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards

Re: Thoughts about NUM_BUFFER_PARTITIONS

От
wenhui qiu
Дата:
Hi Heikki Linnakangas
   I saw git log found this commit:https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892 ,I don't seem to see an email discussing this commit. As the commit log tells us, we don't know exactly how large a value is optimal, and I believe it's more flexible to make it as a parameter.Thank you very much tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS was just doubled in this commit, is there a more appropriate ratio between them?



```````````````````````````````````````````````````````````````````````````
commit 3acc10c997f916f6a741d0b4876126b7b08e3892
Author: Robert Haas <rhaas@postgresql.org>
Date:   Thu Oct 2 13:58:50 2014 -0400

    Increase the number of buffer mapping partitions to 128.

    Testing by Amit Kapila, Andres Freund, and myself, with and without
    other patches that also aim to improve scalability, seems to indicate
    that this change is a significant win over the current value and over
    smaller values such as 64.  It's not clear how high we can push this
    value before it starts to have negative side-effects elsewhere, but
    going this far looks OK.

`````````````````````````````````````````````````````````

wenhui qiu <qiuwenhuifx@gmail.com> 于2024年2月20日周二 09:36写道:
Hi Japlin Li
   Thank you for such important information ! Got it 

Japin Li <japinli@hotmail.com> 于2024年2月19日周一 10:26写道:

On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>

Thanks for the explanation!  Got it.

> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards

Re: Thoughts about NUM_BUFFER_PARTITIONS

От
Tomas Vondra
Дата:
Hi,

On 2/20/24 03:16, wenhui qiu wrote:
> Hi Heikki Linnakangas
>    I saw git log found this commit:
> https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892
> ,I don't seem to see an email discussing this commit. As the commit log
> tells us, we don't know exactly how large a value is optimal, and I believe
> it's more flexible to make it as a parameter.Thank you very much
> tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS
> was just doubled in this commit, is there a more appropriate ratio between
> them?
> 

I think the discussion for that commit is in [1] (and especially [2]).

That being said, I don't think MAX_SIMUL_LOCKS and NUM_BUFFER_PARTITIONS
need to be in any particular ratio. The only requirement is that there
needs to be enough slack, and 72 locks seemed to work quite fine until
now - I don't think we need to change that.

What might be necessary is improving held_lwlocks - we treat is as LIFO,
but more as an expectation than a hard rule. I'm not sure how often we
violate that rule (if at all), but if we do then it's going to get more
expensive as we increase the number of locks. But I'm not sure this is
actually a problem in practice, we usually hold very few LWLocks at the
same time.

As for making this a parameter, I'm rather opposed to the idea. If we
don't have a very clear idea how to set this limit, what's the chance
users with little knowledge of the internals will pick a good value?
Adding yet another knob would just mean users start messing with it in
random ways (typically increasing it to very high value, because "more
is better"), causing more harm than good.

Adding it as a GUC would also require making some parts dynamic (instead
of just doing static allocation with compile-time constants). That's not
great, but I'm not sure how significant the penalty might be.


IMHO adding a GUC might be acceptable only if we fail to come up with a
good value (which is going to be a trade off), and if someone
demonstrates a clear benefit of increasing the value (which I don't
think happen in this thread yet).


regards


[1]
https://www.postgresql.org/message-id/flat/CAA4eK1LSTcMwXNO8ovGh7c0UgCHzGbN%3D%2BPjggfzQDukKr3q_DA%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/CA%2BTgmoY58dQi8Z%3DFDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Thoughts about NUM_BUFFER_PARTITIONS

От
wenhui qiu
Дата:
Hi Tomas Vondra
    Thanks for the information!  But I found postgres pro enterprise version has been implemented ,However, it defaults to 16 and maxes out at 128, and the maxes are the same as in PostgreSQL.I kindly  hope that if the developers can explain what the purpose of this is.May be 128 partitions is the optimal value,It's a parameter to make it easier to adjust the number of partitions in the future when it's really not enough. and the code comments also said that  hope to implement the parameter in the future


( https://postgrespro.com/docs/enterprise/16/runtime-config-locks )


log2_num_lock_partitions (integer#

This controls how many partitions the shared lock tables are divided into. Number of partitions is calculated by raising 2 to the power of this parameter. The default value is 4, which corresponds to 16 partitions, and the maximum is 8. This parameter can only be set in the postgresql.conf file or on the server command line.


Best wish 


On Tue, 20 Feb 2024 at 21:55, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

On 2/20/24 03:16, wenhui qiu wrote:
> Hi Heikki Linnakangas
>    I saw git log found this commit:
> https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892
> ,I don't seem to see an email discussing this commit. As the commit log
> tells us, we don't know exactly how large a value is optimal, and I believe
> it's more flexible to make it as a parameter.Thank you very much
> tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS
> was just doubled in this commit, is there a more appropriate ratio between
> them?
>

I think the discussion for that commit is in [1] (and especially [2]).

That being said, I don't think MAX_SIMUL_LOCKS and NUM_BUFFER_PARTITIONS
need to be in any particular ratio. The only requirement is that there
needs to be enough slack, and 72 locks seemed to work quite fine until
now - I don't think we need to change that.

What might be necessary is improving held_lwlocks - we treat is as LIFO,
but more as an expectation than a hard rule. I'm not sure how often we
violate that rule (if at all), but if we do then it's going to get more
expensive as we increase the number of locks. But I'm not sure this is
actually a problem in practice, we usually hold very few LWLocks at the
same time.

As for making this a parameter, I'm rather opposed to the idea. If we
don't have a very clear idea how to set this limit, what's the chance
users with little knowledge of the internals will pick a good value?
Adding yet another knob would just mean users start messing with it in
random ways (typically increasing it to very high value, because "more
is better"), causing more harm than good.

Adding it as a GUC would also require making some parts dynamic (instead
of just doing static allocation with compile-time constants). That's not
great, but I'm not sure how significant the penalty might be.


IMHO adding a GUC might be acceptable only if we fail to come up with a
good value (which is going to be a trade off), and if someone
demonstrates a clear benefit of increasing the value (which I don't
think happen in this thread yet).


regards


[1]
https://www.postgresql.org/message-id/flat/CAA4eK1LSTcMwXNO8ovGh7c0UgCHzGbN%3D%2BPjggfzQDukKr3q_DA%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/CA%2BTgmoY58dQi8Z%3DFDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA%40mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Thoughts about NUM_BUFFER_PARTITIONS

От
Tomas Vondra
Дата:
On 2/23/24 15:40, wenhui qiu wrote:
> Hi Tomas Vondra
>     Thanks for the information!  But I found postgres pro enterprise
> version has been implemented ,However, it defaults to 16 and maxes out at
> 128, and the maxes are the same as in PostgreSQL.I kindly  hope that if the
> developers can explain what the purpose of this is.May be 128 partitions is
> the optimal value,It's a parameter to make it easier to adjust the number
> of partitions in the future when it's really not enough. and the code
> comments also said that  hope to implement the parameter in the future
> 
> 
> ( https://postgrespro.com/docs/enterprise/16/runtime-config-locks )
> 
> 
> log2_num_lock_partitions (integer) #
> <https://postgrespro.com/docs/enterprise/16/runtime-config-locks#GUC-LOG2-NUM-LOCK-PARTITIONS>
> 
> This controls how many partitions the shared lock tables are divided into.
> Number of partitions is calculated by raising 2 to the power of this
> parameter. The default value is 4, which corresponds to 16 partitions, and
> the maximum is 8. This parameter can only be set in the postgresql.conf file
> or on the server command line.
> 
> Best wish
> 

Hi,

Well, if Postgres Pro implements this, I don't know what their reasoning
was exactly, but I guess they wanted to make it easier to experiment
with different values (without rebuild), or maybe they actually have
systems where they know higher values help ...

Note: I'd point the maximum value 8 translates to 256, so no - it does
not max at the same value as PostgreSQL.

Anyway, this value is inherently a trade off. If it wasn't, we'd set it
to something super high from the start. But having more partitions of
the lock table has a cost too, because some parts need to acquire all
the partition locks (and that's O(N) where N = number of partitions).

Of course, not having enough lock table partitions has a cost too,
because it increases the chance of conflict between backends (who happen
to need to operate on the same partition). This constant is not
constant, it changes over time - with 16 cores the collisions might have
been rare, with 128 not so much. Also, with partitioning we may need
many more locks per query.

This means it's entirely possible it'd be good to have more than 128
partitions of the lock table, but we only change this kind of stuff if
we have 2 things:

1) clear demonstration of the benefits (e.g. a workload showing an
improvement with higher number of partitions)

2) analysis of how this affects other workloads (e.g. cases that may
need to lock all the partitions etc)

Ultimately it's a trade off - we need to judge if the impact in (2) is
worth the improvement in (1).

None of this was done in this thread. There's no demonstration of the
benefits, no analysis of the impact etc.

As for turning the parameter into a GUC, that has a cost too. Either
direct - a compiler can do far more optimizations with compile-time
constants than with values that may change during execution, for
example. Or indirect - if we can't give users any guidance how/when to
tune the GUC, it can easily lead to misconfiguration (I can't even count
how many times I had to deal with systems where the values were "tuned"
following the logic that more is always better).

Which just leads back to (1) and (2) even for this case.


regards


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company