Обсуждение: Database Stalls

Поиск
Список
Период
Сортировка

Database Stalls

От
Mok
Дата:
Hi, 

We've started to observe instances of one of our databases stalling for a few seconds. 

We see a spike in wal write locks then nothing for a few seconds. After which we have spike latency as processes waiting to get to the db can do so. 

There is nothing in the postgres logs that give us any clues to what could be happening, no locks, unusually high/long running transactions, just a pause and resume. 

Could anyone give me any advice as to what to look for when it comes to checking the underlying disk that the db is on? 

Thanks, 

Gurmokh 


Re: Database Stalls

От
Justin Pryzby
Дата:
On Mon, Jan 30, 2023 at 05:47:49PM +0000, Mok wrote:
> Hi,
> 
> We've started to observe instances of one of our databases stalling for a
> few seconds.
> 
> We see a spike in wal write locks then nothing for a few seconds. After
> which we have spike latency as processes waiting to get to the db can do
> so.
> 
> There is nothing in the postgres logs that give us any clues to what could
> be happening, no locks, unusually high/long running transactions, just a
> pause and resume.
> 
> Could anyone give me any advice as to what to look for when it comes to
> checking the underlying disk that the db is on?

What version postgres?  What settings have non-default values ?
                                           
 
What OS/version?  What environment/hardware?  VM/image/provider/...
                                           
 

Have you enabled logging for vacuum/checkpoints/locks ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions



Re: Database Stalls

От
José Arthur Benetasso Villanova
Дата:
On Mon, Jan 30, 2023 at 2:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Jan 30, 2023 at 05:47:49PM +0000, Mok wrote:
> Hi,
>
> We've started to observe instances of one of our databases stalling for a
> few seconds.
>
> We see a spike in wal write locks then nothing for a few seconds. After
> which we have spike latency as processes waiting to get to the db can do
> so.
>
> There is nothing in the postgres logs that give us any clues to what could
> be happening, no locks, unusually high/long running transactions, just a
> pause and resume.
>
> Could anyone give me any advice as to what to look for when it comes to
> checking the underlying disk that the db is on?

What version postgres?  What settings have non-default values ?                                                                                                     
What OS/version?  What environment/hardware?  VM/image/provider/...                                                                                                 

Have you enabled logging for vacuum/checkpoints/locks ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions


In addition to previous questions, if possible, a SELECT * FROM pg_stat_activity at the moment of the stall. The most important information is the wait_event column. My guess is the disk, but just the select at the right moment can answer this.

--
José Arthur Benetasso Villanova

Re: Database Stalls

От
Shiv Iyer
Дата:
Hi Burmokh,

Please take a look at this article copied below and ping me for further guidance. Thanks! 


How expensive SQLs can impact PostgreSQL Performance? - https://minervadb.xyz/how-expensive-sqls-can-impact-postgresql-performance/ 


Best
Shiv 



On 30-Jan-2023, at 11:17 PM, Mok <gurmokh@gmail.com> wrote:

Hi, 

We've started to observe instances of one of our databases stalling for a few seconds. 

We see a spike in wal write locks then nothing for a few seconds. After which we have spike latency as processes waiting to get to the db can do so. 

There is nothing in the postgres logs that give us any clues to what could be happening, no locks, unusually high/long running transactions, just a pause and resume. 

Could anyone give me any advice as to what to look for when it comes to checking the underlying disk that the db is on? 

Thanks, 

Gurmokh 



Re: Database Stalls

От
Mok
Дата:
Hi, 

Unfortunately there is no pg_stat_activity data available as we are unaware of the issue until it has already happened. 

The version we are on is 12.11. 

I don't think it is due to locks as there are none in the logs. Vacuums are logged also and none occur before or after this event. Checkpoint timeout is set to 1 hour and these events do not coincide with checkpoints. 

Gurmokh

On Mon, 30 Jan 2023 at 18:47, Shiv Iyer <shiv@minervadb.com> wrote:
Hi Burmokh,

Please take a look at this article copied below and ping me for further guidance. Thanks! 


How expensive SQLs can impact PostgreSQL Performance? - https://minervadb.xyz/how-expensive-sqls-can-impact-postgresql-performance/ 


Best
Shiv 



On 30-Jan-2023, at 11:17 PM, Mok <gurmokh@gmail.com> wrote:

Hi, 

We've started to observe instances of one of our databases stalling for a few seconds. 

We see a spike in wal write locks then nothing for a few seconds. After which we have spike latency as processes waiting to get to the db can do so. 

There is nothing in the postgres logs that give us any clues to what could be happening, no locks, unusually high/long running transactions, just a pause and resume. 

Could anyone give me any advice as to what to look for when it comes to checking the underlying disk that the db is on? 

Thanks, 

Gurmokh 



Fwd: Database Stalls

От
Rick Otten
Дата:

On Mon, Jan 30, 2023 at 4:32 PM Mok <gurmokh@gmail.com> wrote:
Hi, 

Unfortunately there is no pg_stat_activity data available as we are unaware of the issue until it has already happened. 

The version we are on is 12.11. 

I don't think it is due to locks as there are none in the logs. Vacuums are logged also and none occur before or after this event. Checkpoint timeout is set to 1 hour and these events do not coincide with checkpoints. 

Gurmokh


Have you eliminated network issues?  I have seen what looks like a database stalling to end up actually being the network packets taking a side trip to halfway around the world for a while.  Or DNS lookups suddenly taking a really long time.

The next most likely thing is disk i/o.  Do you have huge corresponding disk i/o spikes or does it drop completely to zero (which is also bad - especially if you are on a SAN and you can't get any packets out on that network).  You'll have to look at your disks via OS tools to see.

Do you have any hardware faults?  Errors on a hardware bus?  Overheating?  I used to have a system that would freeze up entirely due to a problem with a serial port that we had a console attached to - it was sending a low level interrupt.  Sometimes it would recover mysteriously if someone hit the carriage return a couple times.  Ie, is it _really_ the database that is locking up, or is it your hardware?

Re: Database Stalls

От
Craig Jackson
Дата:
Consider creating a pg_stat_activity history table. This would allow you to look back at the time of incident and verify if any unusual activity was occurring in the database. Something like:

CREATE TABLE pg_stat_activity_hist AS
SELECT    now() AS sample_time,    a.*
FROM    pg_stat_activity a WITH NO data;

Then with a cron job or a pg job scheduler insert the pg_stat_activity history at some desired interval (e.g 30s, 1m or 5m):

INSERT INTO pg_stat_activity_hist
SELECT
    now(),
    a.*
FROM
    pg_stat_activity a
WHERE
    state IN ('active', 'idle in transaction’);

Then regularly purge any sample_times older than some desired interval (1 day, 1 week, 1 month). 

Not a perfect solution because the problem (if a db problem) could occur between your pg_stat_activity samples. We keep this kind of history and it is very helpful when trying to find a post-event root cause. 

Craig



On Jan 30, 2023 at 10:47:49 AM, Mok <gurmokh@gmail.com> wrote:
Hi, 

We've started to observe instances of one of our databases stalling for a few seconds. 

We see a spike in wal write locks then nothing for a few seconds. After which we have spike latency as processes waiting to get to the db can do so. 

There is nothing in the postgres logs that give us any clues to what could be happening, no locks, unusually high/long running transactions, just a pause and resume. 

Could anyone give me any advice as to what to look for when it comes to checking the underlying disk that the db is on? 

Thanks, 

Gurmokh 



This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Вложения