Обсуждение: [MASSMAIL]LWlock:LockManager waits

Поиск
Список
Период
Сортировка

[MASSMAIL]LWlock:LockManager waits

От
James Pang
Дата:
   we found sometimes , with many sessions running same query "select ..." at the same time, saw many sessions waiting on "LockManager".  for example, pg_stat_activity show.  It's a production server, so no enable trace_lwlocks flag. could you direct me what's the possible reason and how to reduce this "lockmanager" lock?  all the sql statement are "select " ,no DML.

   time                                             wait_event           count(pid) 
2024-04-08 09:00:06.043996+00 | DataFileRead  |    42
 2024-04-08 09:00:06.043996+00 |               |    15
 2024-04-08 09:00:06.043996+00 | LockManager   |    31
 2024-04-08 09:00:06.043996+00 | BufferMapping |    46
 2024-04-08 09:00:07.114015+00 | LockManager   |    43
 2024-04-08 09:00:07.114015+00 | DataFileRead  |    28
 2024-04-08 09:00:07.114015+00 | ClientRead    |    11
 2024-04-08 09:00:07.114015+00 |               |    11

Thanks,

James

Re: LWlock:LockManager waits

От
Laurenz Albe
Дата:
On Tue, 2024-04-09 at 11:07 +0800, James Pang wrote:
>    we found sometimes , with many sessions running same query "select ..." at the same time, saw many sessions
waitingon "LockManager".  for example, pg_stat_activity show.  It's a production server, so no enable trace_lwlocks
flag.could you direct me what's the possible reason and how to reduce this "lockmanager" lock?  all the sql statement
are"select " ,no DML. 
>
>    time                                             wait_event           count(pid) 
> 2024-04-08 09:00:06.043996+00 | DataFileRead  |    42
>  2024-04-08 09:00:06.043996+00 |               |    15
>  2024-04-08 09:00:06.043996+00 | LockManager   |    31
>  2024-04-08 09:00:06.043996+00 | BufferMapping |    46
>  2024-04-08 09:00:07.114015+00 | LockManager   |    43
>  2024-04-08 09:00:07.114015+00 | DataFileRead  |    28
>  2024-04-08 09:00:07.114015+00 | ClientRead    |    11
>  2024-04-08 09:00:07.114015+00 |               |    11

That's quite obvious: too many connections cause internal contention in the database.

Reduce the number of connections by using a reasonably sized connection pool.

Yours,
Laurenz Albe



Re: LWlock:LockManager waits

От
James Pang
Дата:
you mean too many concurrent sessions trying to acquire lock on same relation , then waiting on  "LockManager" LWlock,right?  this contention occurred on parsing ,planning, or execute step ? 

Thanks,

James

Laurenz Albe <laurenz.albe@cybertec.at> 於 2024年4月9日週二 下午12:31寫道:
On Tue, 2024-04-09 at 11:07 +0800, James Pang wrote:
>    we found sometimes , with many sessions running same query "select ..." at the same time, saw many sessions waiting on "LockManager".  for example, pg_stat_activity show.  It's a production server, so no enable trace_lwlocks flag. could you direct me what's the possible reason and how to reduce this "lockmanager" lock?  all the sql statement are "select " ,no DML.
>
>    time                                             wait_event           count(pid) 
> 2024-04-08 09:00:06.043996+00 | DataFileRead  |    42
>  2024-04-08 09:00:06.043996+00 |               |    15
>  2024-04-08 09:00:06.043996+00 | LockManager   |    31
>  2024-04-08 09:00:06.043996+00 | BufferMapping |    46
>  2024-04-08 09:00:07.114015+00 | LockManager   |    43
>  2024-04-08 09:00:07.114015+00 | DataFileRead  |    28
>  2024-04-08 09:00:07.114015+00 | ClientRead    |    11
>  2024-04-08 09:00:07.114015+00 |               |    11

That's quite obvious: too many connections cause internal contention in the database.

Reduce the number of connections by using a reasonably sized connection pool.

Yours,
Laurenz Albe

Re: LWlock:LockManager waits

От
Frits Hoogland
Дата:
James,

A lock can be obtained in the parse, plan and execute step, depending on cache, state and type of object.

A LWLock is a spinlock, a low level access mechanism that is supposed to be extremely quickly. It is used to serialise access to elementary structures mostly for changes.
A Lock is an higher level lock that is much more sophisticated, contains multiple states and can order multiple requests. It is used to safeguard transaction intention for objects.

The wait LWLock:LockManager is documented to have two common reasons: too many locks being acquired, exceeding the fastpath slots number (16) and/or exceeding CPU capacity.

What is happening if you are waiting for a LWLock is that the number of processes trying to access the structure (the lock manager) is higher than one. Because the LWLock is meant to be held so briefly that there should no waiting, it means that if you are waiting for it, there must be a reason it’s held so long. An obvious reason for holding a LWLock too long is if there are more tasks on the OS than CPU’s, as Laurenz indicates. If such a situation happens, it’s possible a tasks is put off CPU by the operating system whilst holding the LWLock, which will greatly increase the time waiting for it, because the LWLock can only be released if the task manages to get back on CPU.

Regards,

Frits Hoogland




On 9 Apr 2024, at 09:54, James Pang <jamespang886@gmail.com> wrote:

you mean too many concurrent sessions trying to acquire lock on same relation , then waiting on  "LockManager" LWlock,right?  this contention occurred on parsing ,planning, or execute step ? 

Thanks,

James

Laurenz Albe <laurenz.albe@cybertec.at> 於 2024年4月9日週二 下午12:31寫道:
On Tue, 2024-04-09 at 11:07 +0800, James Pang wrote:
>    we found sometimes , with many sessions running same query "select ..." at the same time, saw many sessions waiting on "LockManager".  for example, pg_stat_activity show.  It's a production server, so no enable trace_lwlocks flag. could you direct me what's the possible reason and how to reduce this "lockmanager" lock?  all the sql statement are "select " ,no DML.
>
>    time                                             wait_event           count(pid) 
> 2024-04-08 09:00:06.043996+00 | DataFileRead  |    42
>  2024-04-08 09:00:06.043996+00 |               |    15
>  2024-04-08 09:00:06.043996+00 | LockManager   |    31
>  2024-04-08 09:00:06.043996+00 | BufferMapping |    46
>  2024-04-08 09:00:07.114015+00 | LockManager   |    43
>  2024-04-08 09:00:07.114015+00 | DataFileRead  |    28
>  2024-04-08 09:00:07.114015+00 | ClientRead    |    11
>  2024-04-08 09:00:07.114015+00 |               |    11

That's quite obvious: too many connections cause internal contention in the database.

Reduce the number of connections by using a reasonably sized connection pool.

Yours,
Laurenz Albe

Re: LWlock:LockManager waits

От
Frits Hoogland
Дата:
I need to rectify myself: LWLock is not a spinlock (anymore). 
The documentation in lwlock.c makes it clear that it used to be spinlock, but now is a counter modified by atomic instructions.

Oh, I forgot to answer:

you mean too many concurrent sessions trying to acquire lock on same relation , then waiting on  "LockManager" LWlock,right? 

This is the point: no, it’s not about the same relation. 

The LWLock:LockManager is a wait event that is raised when competing for the LWLock that protects the shared Lock structure, which holds all of the locks of the database.


Frits Hoogland




On 9 Apr 2024, at 09:54, James Pang <jamespang886@gmail.com> wrote:

you mean too many concurrent sessions trying to acquire lock on same relation , then waiting on  "LockManager" LWlock,right?  this contention occurred on parsing ,planning, or execute step ? 

Thanks,

James

Laurenz Albe <laurenz.albe@cybertec.at> 於 2024年4月9日週二 下午12:31寫道:
On Tue, 2024-04-09 at 11:07 +0800, James Pang wrote:
>    we found sometimes , with many sessions running same query "select ..." at the same time, saw many sessions waiting on "LockManager".  for example, pg_stat_activity show.  It's a production server, so no enable trace_lwlocks flag. could you direct me what's the possible reason and how to reduce this "lockmanager" lock?  all the sql statement are "select " ,no DML.
>
>    time                                             wait_event           count(pid) 
> 2024-04-08 09:00:06.043996+00 | DataFileRead  |    42
>  2024-04-08 09:00:06.043996+00 |               |    15
>  2024-04-08 09:00:06.043996+00 | LockManager   |    31
>  2024-04-08 09:00:06.043996+00 | BufferMapping |    46
>  2024-04-08 09:00:07.114015+00 | LockManager   |    43
>  2024-04-08 09:00:07.114015+00 | DataFileRead  |    28
>  2024-04-08 09:00:07.114015+00 | ClientRead    |    11
>  2024-04-08 09:00:07.114015+00 |               |    11

That's quite obvious: too many connections cause internal contention in the database.

Reduce the number of connections by using a reasonably sized connection pool.

Yours,
Laurenz Albe

Re: LWlock:LockManager waits

От
"Luiz Fernando G. Verona"
Дата:
Hi James,

Take a look here, in the links you will find many info and real examples for LockManager issues.


Luiz

On Tue, Apr 9, 2024 at 4:08 AM James Pang <jamespang886@gmail.com> wrote:
   we found sometimes , with many sessions running same query "select ..." at the same time, saw many sessions waiting on "LockManager".  for example, pg_stat_activity show.  It's a production server, so no enable trace_lwlocks flag. could you direct me what's the possible reason and how to reduce this "lockmanager" lock?  all the sql statement are "select " ,no DML.

   time                                             wait_event           count(pid) 
2024-04-08 09:00:06.043996+00 | DataFileRead  |    42
 2024-04-08 09:00:06.043996+00 |               |    15
 2024-04-08 09:00:06.043996+00 | LockManager   |    31
 2024-04-08 09:00:06.043996+00 | BufferMapping |    46
 2024-04-08 09:00:07.114015+00 | LockManager   |    43
 2024-04-08 09:00:07.114015+00 | DataFileRead  |    28
 2024-04-08 09:00:07.114015+00 | ClientRead    |    11
 2024-04-08 09:00:07.114015+00 |               |    11

Thanks,

James