Обсуждение: Monitoring for long running transactions

Поиск
Список
Период
Сортировка

Monitoring for long running transactions

От
Samuel Smith
Дата:
We had a customer complaining of random data loss for the last 6 months 
or so. We eventually tracked it down to a combination of bad coding and 
a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by 
the web app when viewing a certain page and 'COMMIT' was never emitted 
after that. So once the app would get restarted, all data changes would 
be lost. Definitely worst case scenario.

So the question is, what is the best way to monitor for this scenario 
going forward? Are there any plugins or community recommended scripts 
already made?

Regards,





Re: Monitoring for long running transactions

От
Adrian Klaver
Дата:
On 6/4/20 10:00 AM, Samuel Smith wrote:
> We had a customer complaining of random data loss for the last 6 months 
> or so. We eventually tracked it down to a combination of bad coding and 
> a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by 
> the web app when viewing a certain page and 'COMMIT' was never emitted 
> after that. So once the app would get restarted, all data changes would 
> be lost. Definitely worst case scenario.
> 
> So the question is, what is the best way to monitor for this scenario 
> going forward? Are there any plugins or community recommended scripts 
> already made?

https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW

"
state     text     Current overall state of this backend. Possible values are:

     active: The backend is executing a query.

     idle: The backend is waiting for a new client command.

     idle in transaction: The backend is in a transaction, but is not 
currently executing a query.

     idle in transaction (aborted): This state is similar to idle in 
transaction, except one of the statements in the transaction caused an 
error.

     fastpath function call: The backend is executing a fast-path function.

     disabled: This state is reported if track_activities is disabled in 
this backend.
"

> 
> Regards,
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Monitoring for long running transactions

От
Samuel Smith
Дата:
On 6/4/20 2:29 PM, Adrian Klaver wrote:
> On 6/4/20 10:00 AM, Samuel Smith wrote:
>> We had a customer complaining of random data loss for the last 6 
>> months or so. We eventually tracked it down to a combination of bad 
>> coding and a couple of bugs with the ORM. Basically, 'BEGIN' was being 
>> emitted by the web app when viewing a certain page and 'COMMIT' was 
>> never emitted after that. So once the app would get restarted, all 
>> data changes would be lost. Definitely worst case scenario.
>>
>> So the question is, what is the best way to monitor for this scenario 
>> going forward? Are there any plugins or community recommended scripts 
>> already made?
> 
> https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW 
> 
> 
> "
> state     text     Current overall state of this backend. Possible 
> values are:
> 
>      active: The backend is executing a query.
> 
>      idle: The backend is waiting for a new client command.
> 
>      idle in transaction: The backend is in a transaction, but is not 
> currently executing a query.
> 
>      idle in transaction (aborted): This state is similar to idle in 
> transaction, except one of the statements in the transaction caused an 
> error.
> 
>      fastpath function call: The backend is executing a fast-path function.
> 
>      disabled: This state is reported if track_activities is disabled in 
> this backend.
> "
> 
>>
>> Regards,
>>
>>
>>
>>
> 
> 


Sorry, I should have clarified that I was aware of the pg_stat_activity 
table. That is how we found the problem in the first place. And yes I 
could just write a bash script and run it in cron. I just didn't know if 
there was a more "official" way to go about this since it is probably a 
common monitoring point and/or if something like this was already made.

Regards,



Re: Monitoring for long running transactions

От
Adrian Klaver
Дата:
On 6/4/20 12:59 PM, Samuel Smith wrote:
> On 6/4/20 2:29 PM, Adrian Klaver wrote:

>>
> 
> 
> Sorry, I should have clarified that I was aware of the pg_stat_activity 
> table. That is how we found the problem in the first place. And yes I 
> could just write a bash script and run it in cron. I just didn't know if 
> there was a more "official" way to go about this since it is probably a 
> common monitoring point and/or if something like this was already made.

I guess it comes down to how much info you want. A list of monitoring tools:

https://wiki.postgresql.org/wiki/Monitoring#pgcenter

Haven't used it but this one:

https://pgmetrics.io/

looks interesting. From here:

https://pgmetrics.io/docs/

Active backends: transaction running too long, idling in transaction, 
waiting for locks

Information on scripting above:

https://pgmetrics.io/docs/scripting.html


> 
> Regards,
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Monitoring for long running transactions

От
Christoph Moench-Tegeder
Дата:
## Samuel Smith (pgsql@net153.net):

> Sorry, I should have clarified that I was aware of the pg_stat_activity 
> table. That is how we found the problem in the first place. And yes I 
> could just write a bash script and run it in cron. I just didn't know if 
> there was a more "official" way to go about this since it is probably a 
> common monitoring point and/or if something like this was already made.

This works fine for everything nagios-like:
https://bucardo.org/check_postgres/check_postgres.pl.html#txn_time

Other solutions may be found via the PostgreSQL wiki - as you noted,
this is (should be) monitored, so almost everything which can monitor
PostgreSQL covers this. (which raises the obvious question - did you
monitor your server or did you just use the common "shouting user"
approach? ;))

Regards,
Christoph

-- 
Spare Space



Re: Monitoring for long running transactions

От
Thomas Kellerer
Дата:
Samuel Smith schrieb am 04.06.2020 um 21:59:
> Sorry, I should have clarified that I was aware of the
> pg_stat_activity table. That is how we found the problem in the first
> place. And yes I could just write a bash script and run it in cron. I
> just didn't know if there was a more "official" way to go about this
> since it is probably a common monitoring point and/or if something
> like this was already made.

What about setting idle_in_transaction_session_timeout to some reasonably high value (15 minutes?)

Then you would get an error in your application because the connection was terminated and you would learn early about
theproblem.
 
Even with monitoring enabled, you would probably still kill those sessions manually as the application most probably
can'tcommit them properly any more.
 

Thomas