Обсуждение: Session WAL activity

Поиск
Список
Период
Сортировка

Session WAL activity

От
Konstantin Knizhnik
Дата:
Hi hackers,

One of our customers complains about that some sessions generates "too much WAL records".
Certainly WAL activity doesn't indicate a problem itself: huge workload cause huge WAL activity.
But them are trying to understand which clients produces so much database changes and complain that there is
no way to get such information in Postgres. For example in Oracle this problems can be solved in this way:

http://www.dba-oracle.com/t_find_session_generating_high_redo.htm

Unfortunately there is actually no simple and accurate way to calculate amount of WAL produced by the particular session.
It is possible to parse WAL (for example using pg_waldump), then using XID->pid mapping accumulate size of transactions produced by each backend.
But this is very inconvenient and not DBA friendly approach.

I have implemented small patch which collects such statistic.
I have added walWritten  field to PGPROC and increment it in CopyXLogRecordToWAL.
It is possible to inspect this field using pg_stat_get_wal_activity(pid) function and also I have added
pg_stat_wal_activity which just adds  wal_written to standard pg_activity view:

postgres=# select pid, backend_type, wal_written from pg_stat_wal_activity ;pid  |         backend_type         | wal_written
------+------------------------------+-------------4405 | autovacuum launcher          |           04407 | logical replication launcher |           04750 | client backend               |       861954403 | background writer            |         2044402 | checkpointer                 |         3284404 | walwriter                    |           0
(6 rows)


I wonder if such extra statistic about session WAL activity is considered to be useful?
The only problem with this approach from my point of view is adding 8 bytes to PGPROC.
But there are already so many fields in this structure (sizeof(PGPROC)=816), that adding yet another 8 bytes should not be noticeable.

Comments are welcome.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: Session WAL activity

От
Kyotaro Horiguchi
Дата:
Hello.

At Tue, 3 Dec 2019 18:01:28 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in
> Hi hackers,
>
> One of our customers complains about that some sessions generates "too
> much WAL records".
> Certainly WAL activity doesn't indicate a problem itself: huge
> workload cause huge WAL activity.
> But them are trying to understand which clients produces so much
> database changes and complain that there is
> no way to get such information in Postgres. For example in Oracle this
> problems can be solved in this way:
>
> http://www.dba-oracle.com/t_find_session_generating_high_redo.htm
>
> Unfortunately there is actually no simple and accurate way to
> calculate amount of WAL produced by the particular session.
> It is possible to parse WAL (for example using pg_waldump), then using
> XID->pid mapping accumulate size of transactions produced by each
> backend.
> But this is very inconvenient and not DBA friendly approach.
>
> I have implemented small patch which collects such statistic.
> I have added walWritten  field to PGPROC and increment it in
> CopyXLogRecordToWAL.
> It is possible to inspect this field using
> pg_stat_get_wal_activity(pid) function and also I have added
> pg_stat_wal_activity which just adds  wal_written to standard
> pg_activity view:
>
> postgres=# select pid, backend_type, wal_written from
> pg_stat_wal_activity ;
>  pid  |         backend_type         | wal_written
> ------+------------------------------+-------------
>  4405 | autovacuum launcher          |           0
>  4407 | logical replication launcher |           0
>  4750 | client backend               |       86195
>  4403 | background writer            |         204
>  4402 | checkpointer                 |         328
>  4404 | walwriter                    |           0
> (6 rows)
>
>
>
> I wonder if such extra statistic about session WAL activity is
> considered to be useful?
>
> The only problem with this approach from my point of view is adding 8
> bytes to PGPROC.
> But there are already so many fields in this structure
> (sizeof(PGPROC)=816), that adding yet another 8 bytes should not be
> noticeable.
>
> Comments are welcome.

It seems to be useful to me. We also might want statistics of other
session IOs.  In that case the table name would be
"pg_stat_session/process_activity". We are aleady collecting most
kinds of the IO activity but it loses session information...

Briefly looking the patch, I have some comments on it.

As mentioned above, if we are intending future exantion of the
session-stats table, the name should be changed.

Backend status is more appropriate than PGPROC. See pgstat.c.

Some kind of locking is needed to update the fields on shared segment.
(LWLocks for PGPROC and PGSTAT_BEGIN/END_WRITE_ACTIVITY for
PgBackendStatus)

Knitpickings:

The patch contains a trace of older trial in
pg_stat_get_activity. Proc OID should be >= 8000 in
patches. src/include/catalog/unused_oids offers some OID for you.


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Session WAL activity

От
Konstantin Knizhnik
Дата:

On 04.12.2019 8:33, Kyotaro Horiguchi wrote:
> It seems to be useful to me. We also might want statistics of other
> session IOs.  In that case the table name would be
> "pg_stat_session/process_activity". We are aleady collecting most
> kinds of the IO activity but it loses session information...

Well, actually monitoring disk activity for the particular 
backend/session can be easily done using some external tools
(just because now in Postgres session=backend=process). So we can 
monitor IO of processes, for example using iotop at Unix
or Performance Monitor at Windows.

Certainly it is more convenient to have such statstic inside Postgres. 
But I am not sure if it is really needed.
Concerning WAL activity situation is more obscure: records can be added 
to the WAL by one process, but written by another.
This is why it is not possible to use some external tools.


>
> Briefly looking the patch, I have some comments on it.
>
> As mentioned above, if we are intending future exantion of the
> session-stats table, the name should be changed.
>
> Backend status is more appropriate than PGPROC. See pgstat.c.
Do you mean pgstat_fetch_stat_beentry?
But why it is better than storing this information directly in PGPROC?
As far as this information  ha to be updated from XLogInsertRecord and 
it seems to be very performance critical function  my intention was to 
minimize
overhead of maintaining this statistic. It is hard to imagine something 
more efficient than just MyProc->walWriten += write_len;

Also pgstat_fetch_stat_beentry is taken backend id, which is not 
reported in pg_stat_activity view and this is why it is more
convenient to pass PID to pg_stat_get_wal_activity. Certainly it is 
possible to map PID to backendid, but... why actually do we need to
perform such mapping if simpler solution exists?

> Some kind of locking is needed to update the fields on shared segment.
> (LWLocks for PGPROC and PGSTAT_BEGIN/END_WRITE_ACTIVITY for
> PgBackendStatus)
This information is updated locally only by backend itself.
Certainly update of 64 bit field is not atomic at 32-but architectures.
But it is just statistic. I do not think that it will be fatal if for a 
moment
we can see some incorrect value of written WAL bytes (and at most 
platforms this
update will be atomic).

As I already wrote above, this information in updated in performance 
critical place and this is why
I want to avoid any expensive operations (such as locking or atomic 
updates) as much as possible.
> Knitpickings:
>
> The patch contains a trace of older trial in
> pg_stat_get_activity. Proc OID should be >= 8000 in
> patches. src/include/catalog/unused_oids offers some OID for you.
>

Will fix it.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Session WAL activity

От
Kyotaro Horiguchi
Дата:
Hi.

At Wed, 4 Dec 2019 16:40:27 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in
>
>
> On 04.12.2019 8:33, Kyotaro Horiguchi wrote:
> > It seems to be useful to me. We also might want statistics of other
> > session IOs.  In that case the table name would be
> > "pg_stat_session/process_activity". We are aleady collecting most
> > kinds of the IO activity but it loses session information...
>
> Well, actually monitoring disk activity for the particular
> backend/session can be easily done using some external tools
> (just because now in Postgres session=backend=process). So we can
> monitor IO of processes, for example using iotop at Unix
> or Performance Monitor at Windows.

Operations that completes on shared buffers cannot be monitored that
way. This is the same with WAL writing.

> Certainly it is more convenient to have such statstic inside
> Postgres. But I am not sure if it is really needed.
> Concerning WAL activity situation is more obscure: records can be
> added to the WAL by one process, but written by another.
> This is why it is not possible to use some external tools.

For clarity, I didn't suggest that this patch should include general
session IO statistics. Just the view name looked a bit specific.

> > Briefly looking the patch, I have some comments on it.
> >
> > As mentioned above, if we are intending future exantion of the
> > session-stats table, the name should be changed.
> >
> > Backend status is more appropriate than PGPROC. See pgstat.c.
> Do you mean pgstat_fetch_stat_beentry?
> But why it is better than storing this information directly in PGPROC?

No it cannot be used there for performance reasons as you are
saying. I'm not sure it's acceptable, but we can directly access
backend status the same way if we expose MyBEEntry (and update it
through a macro or a inline function).  If we don't need per record
resolution for the value, we can update a process local variable at
WAL-write time then write it to backend status at commit time or at
the same timing as pgstat reporting.

According to my faint memory, PGPROC is thought that it must be kept
as small as possible for the reasons of CPU caches, that is the reason
for PgBackendStatus.

> As far as this information  ha to be updated from XLogInsertRecord and
> it seems to be very performance critical function  my intention was to
> minimize
> overhead of maintaining this statistic. It is hard to imagine
> something more efficient than just MyProc->walWriten += write_len;
>
> Also pgstat_fetch_stat_beentry is taken backend id, which is not
> reported in pg_stat_activity view and this is why it is more
> convenient to pass PID to pg_stat_get_wal_activity. Certainly it is
> possible to map PID to backendid, but... why actually do we need to
> perform such mapping if simpler solution exists?
>
> > Some kind of locking is needed to update the fields on shared segment.
> > (LWLocks for PGPROC and PGSTAT_BEGIN/END_WRITE_ACTIVITY for
> > PgBackendStatus)
> This information is updated locally only by backend itself.
> Certainly update of 64 bit field is not atomic at 32-but
> architectures.
> But it is just statistic. I do not think that it will be fatal if for
> a moment
> we can see some incorrect value of written WAL bytes (and at most
> platforms this
> update will be atomic).

At least reader needs to take procarray lock to keep PID-WALwrite
consistency, in order to prevent reading WALwrite values for a wrong
process.

> As I already wrote above, this information in updated in performance
> critical place and this is why
> I want to avoid any expensive operations (such as locking or atomic
> updates) as much as possible.

I'm afraid that the reason doesn't justify expanding PGPROC..

> > Knitpickings:
> >
> > The patch contains a trace of older trial in
> > pg_stat_get_activity. Proc OID should be >= 8000 in
> > patches. src/include/catalog/unused_oids offers some OID for you.
> >
>
> Will fix it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Session WAL activity

От
Konstantin Knizhnik
Дата:

On 05.12.2019 5:37, Kyotaro Horiguchi wrote:
> It seems to be useful to me. We also might want statistics of other
>>> session IOs.  In that case the table name would be
>>> "pg_stat_session/process_activity". We are aleady collecting most
>>> kinds of the IO activity but it loses session information...
>> Well, actually monitoring disk activity for the particular
>> backend/session can be easily done using some external tools
>> (just because now in Postgres session=backend=process). So we can
>> monitor IO of processes, for example using iotop at Unix
>> or Performance Monitor at Windows.
> Operations that completes on shared buffers cannot be monitored that
> way. This is the same with WAL writing.

The questions is what we are going to monitor?
Amount of read/dirtied buffers or amount of disk ops?
>
>> Certainly it is more convenient to have such statstic inside
>> Postgres. But I am not sure if it is really needed.
>> Concerning WAL activity situation is more obscure: records can be
>> added to the WAL by one process, but written by another.
>> This is why it is not possible to use some external tools.
> For clarity, I didn't suggest that this patch should include general
> session IO statistics. Just the view name looked a bit specific.

I am not sure if pg_stat_wal_activity view should be added at all.
We can just add pg_stat_get_wal_activity function and let user specify 
PID of backend himself (for example by performing join with 
pg_stat_activity).
I proposed name pg_stat_wal_activity just for similarity with 
pg_stat_activity but can use any other proposed name.

>
>>> Briefly looking the patch, I have some comments on it.
>>>
>>> As mentioned above, if we are intending future exantion of the
>>> session-stats table, the name should be changed.
>>>
>>> Backend status is more appropriate than PGPROC. See pgstat.c.
>> Do you mean pgstat_fetch_stat_beentry?
>> But why it is better than storing this information directly in PGPROC?
> No it cannot be used there for performance reasons as you are
> saying. I'm not sure it's acceptable, but we can directly access
> backend status the same way if we expose MyBEEntry (and update it
> through a macro or a inline function).  If we don't need per record
> resolution for the value, we can update a process local variable at
> WAL-write time then write it to backend status at commit time or at
> the same timing as pgstat reporting.
>
> According to my faint memory, PGPROC is thought that it must be kept
> as small as possible for the reasons of CPU caches, that is the reason
> for PgBackendStatus.
Why do you think that adding one addition (without any locks and 
function calls) to CopyXLogRecordToWAL is not acceptable.
It is just one instruction added to expensive functions. At least I have 
not noticed any measurable impact on performance.

Concerning keeping PGPROC size as small as possible, I agree that it is 
reasonable argument.
But even now it is very large (816 bytes) and adding extra 8 bytes will 
increase it on less than 1%.

>>
>> This information is updated locally only by backend itself.
>> Certainly update of 64 bit field is not atomic at 32-but
>> architectures.
>> But it is just statistic. I do not think that it will be fatal if for
>> a moment
>> we can see some incorrect value of written WAL bytes (and at most
>> platforms this
>> update will be atomic).
> At least reader needs to take procarray lock to keep PID-WALwrite
> consistency, in order to prevent reading WALwrite values for a wrong
> process.

Sorry, but I still do not understand whats wrong can happen if reader 
will see WAL activity of wrong process.
Yes, correspondent backend may be already terminated and its PGPROC 
entry can be reused for some other process.
In this case we can wrongly attribute WAL traffic generated by 
terminated backend to the new process
or report zero traffic for old process. But this information is mostly 
needed for live (active) backends. So I do not think
that race conditions here are so critical.

Right now pg_stat_activity also accessing PGPROC to obtain wait event 
information and also not taking any locks.
So it can wrongly report backend status. But I never heard that somebody 
complains about it.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Session WAL activity

От
Michael Paquier
Дата:
On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
> Concerning keeping PGPROC size as small as possible, I agree that it is
> reasonable argument.
> But even now it is very large (816 bytes) and adding extra 8 bytes will
> increase it on less than 1%.

It does not mean that we should add all kind of things to PGPROC as
that's a structure sensitive enough already.  By the way, why do you
assume that 8-byte reads are always safe and atomic in the patch?

> Right now pg_stat_activity also accessing PGPROC to obtain wait event
> information and also not taking any locks.
> So it can wrongly report backend status. But I never heard that somebody
> complains about it.

Please see pgstat.h, close to pgstat_report_wait_start().
--
Michael

Вложения

Re: Session WAL activity

От
Konstantin Knizhnik
Дата:

On 06.12.2019 4:57, Michael Paquier wrote:
> On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
>> Concerning keeping PGPROC size as small as possible, I agree that it is
>> reasonable argument.
>> But even now it is very large (816 bytes) and adding extra 8 bytes will
>> increase it on less than 1%.
> It does not mean that we should add all kind of things to PGPROC as
> that's a structure sensitive enough already.  By the way, why do you
> assume that 8-byte reads are always safe and atomic in the patch?
I never assumed it - in the previous mail I wrote:

Certainly update of 64 bit field is not atomic at 32-but architectures.

>> Right now pg_stat_activity also accessing PGPROC to obtain wait event
>> information and also not taking any locks.
>> So it can wrongly report backend status. But I never heard that somebody
>> complains about it.
> Please see pgstat.h, close to pgstat_report_wait_start().

Sorry, I do not understand what should I look for?
Do you mean this comment:
     /*
      * Since this is a four-byte field which is always read and written as
      * four-bytes, updates are atomic.
      */

Yes, I already  have noticed that as far as walWritten is 64-bit, its 
update is not atomic at 32-bit platforms and so it is possible to see 
sometimes incorrect values.
So monotone observe of walWritten can be violated. From my point of view 
it is not so critical to enforce update of this fields under lock or 
accumulating result in local variable with later write it to backend 
status at commit time as Kyotaro proposed. Monitoring of WAL activity is 
especially interested for long living transactions and from my point of 
view it is much more
important to be able to see up-to-date but may be not always correct 
information then do not see any information at all before commit.
Please also take in account the percent of 32-bit Postgres installations 
and probability of observing non-atomic update of 64-bit walWritten 
field (I think that you will have no chances to see it even if you will 
run Postgres for a years).

But what I mean by "wrongly report backend wait event status" is that  
pg_stat_activity may report wait event status for wrong backend.
I.e. if backend is already terminated and its PGPROC entry is reused by 
some other backend, than you can see incorrect wait event information:
backend with such PID actually never sleep on this event.


In my reply

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Session WAL activity

От
Kyotaro Horiguchi
Дата:
At Fri, 6 Dec 2019 11:22:14 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in
>
>
> On 06.12.2019 4:57, Michael Paquier wrote:
> > On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
> > Please see pgstat.h, close to pgstat_report_wait_start().
>
> Sorry, I do not understand what should I look for?
> Do you mean this comment:
>     /*
>      * Since this is a four-byte field which is always read and
> written as
>      * four-bytes, updates are atomic.
>      */
>
> Yes, I already  have noticed that as far as walWritten is 64-bit, its
> update is not atomic at 32-bit platforms and so it is possible to see
> sometimes incorrect values.
> So monotone observe of walWritten can be violated. From my point of
> view it is not so critical to enforce update of this fields under lock
> or accumulating result in local variable with later write it to
> backend status at commit time as Kyotaro proposed. Monitoring of WAL
> activity is especially interested for long living transactions and
> from my point of view it is much more
> important to be able to see up-to-date but may be not always correct
> information then do not see any information at all before commit.
> Please also take in account the percent of 32-bit Postgres
> installations and probability of observing non-atomic update of 64-bit
> walWritten field (I think that you will have no chances to see it even
> if you will run Postgres for a years).

Still I'm not sure non-atomic write is acceptable, but I agree on the
necessity of updating it during a transaction. Couldn't we update
shared stats every n bytes (XLOG_BLCKSZ or such) or every command end?

I think we should refrain from inserting an instruction within the
WALInsertLock section, but I'm not sure which is better between "var
+= var" within the section and "if (inserted) var += var;" outside. If
we can ignore the possitbility of the case where xlogswitch is
omitted, the "if (inserted)" is not needed.

> But what I mean by "wrongly report backend wait event status" is that 
> pg_stat_activity may report wait event status for wrong backend.
> I.e. if backend is already terminated and its PGPROC entry is reused
> by some other backend, than you can see incorrect wait event
> information:
> backend with such PID actually never sleep on this event.

I saw a case where an entry with very old xact_start_timestamp
suddenly popped up in pg_stat_activity but I haven't found the cause..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Session WAL activity

От
Konstantin Knizhnik
Дата:

On 11.12.2019 7:26, Kyotaro Horiguchi wrote:
>
> Still I'm not sure non-atomic write is acceptable, but I agree on the
> necessity of updating it during a transaction. Couldn't we update
> shared stats every n bytes (XLOG_BLCKSZ or such) or every command end?
>
> I think we should refrain from inserting an instruction within the
> WALInsertLock section, but I'm not sure which is better between "var
> += var" within the section and "if (inserted) var += var;" outside. If
> we can ignore the possitbility of the case where xlogswitch is
> omitted, the "if (inserted)" is not needed.

I think that 32-bit Postgres installations are really exotic, but I 
agree that showing incorrect result (even with very small probability)
is not acceptable behavior in this case. I attached new versoin of the 
patch with use pg_atomic_write_u64 for updating walWritten field.
As far as at 64-bit systems, pg_atomic_write_u64and pg_atomic_read_u64 
are translated to ordinary memory access, them should not have some 
negative
impact on performance.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: Session WAL activity

От
Craig Ringer
Дата:
On Fri, 6 Dec 2019 at 09:57, Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
> Concerning keeping PGPROC size as small as possible, I agree that it is
> reasonable argument.
> But even now it is very large (816 bytes) and adding extra 8 bytes will
> increase it on less than 1%.

It does not mean that we should add all kind of things to PGPROC as
that's a structure sensitive enough already.

Right. It's not as critical as PGXACT, but PGPROC is still significant for scalability and connection count limits.

It's a shame we can't really keep most of it in backend-private memory and copy it to requestors when they want it, say into a temporary DSA or over a shm_mq. But our single threaded model means we just cannot rely on backends being responsive in a timely enough manner to supply data on-demand. That doesn't mean we have to push it to PGPROC though: we could be sending the parts that don't have to be super-fresh to the stats collector or a new related process for active session stats and letting it aggregate them.

That's way beyond the scope of this patch though. So personally I'm OK with the new PGPROC field. Visibility into Pg's activity is woefully limited and something we need to prioritize more.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Session WAL activity

От
Bruce Momjian
Дата:
On Thu, Dec 12, 2019 at 09:31:22AM +0800, Craig Ringer wrote:
> On Fri, 6 Dec 2019 at 09:57, Michael Paquier <michael@paquier.xyz> wrote:
> 
>     On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
>     > Concerning keeping PGPROC size as small as possible, I agree that it is
>     > reasonable argument.
>     > But even now it is very large (816 bytes) and adding extra 8 bytes will
>     > increase it on less than 1%.
> 
>     It does not mean that we should add all kind of things to PGPROC as
>     that's a structure sensitive enough already.
> 
> 
> Right. It's not as critical as PGXACT, but PGPROC is still significant for
> scalability and connection count limits.
> 
> It's a shame we can't really keep most of it in backend-private memory and copy
> it to requestors when they want it, say into a temporary DSA or over a shm_mq.
> But our single threaded model means we just cannot rely on backends being
> responsive in a timely enough manner to supply data on-demand. That doesn't
> mean we have to push it to PGPROC though: we could be sending the parts that
> don't have to be super-fresh to the stats collector or a new related process
> for active session stats and letting it aggregate them.
> 
> That's way beyond the scope of this patch though. So personally I'm OK with the
> new PGPROC field. Visibility into Pg's activity is woefully limited and
> something we need to prioritize more.

Uh, how much does the new field get us near the CPU cache line max size
for a single PGPROC entry?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Session WAL activity

От
Andres Freund
Дата:
Hi,

On 2019-12-20 16:38:32 -0500, Bruce Momjian wrote:
> On Thu, Dec 12, 2019 at 09:31:22AM +0800, Craig Ringer wrote:
> > On Fri, 6 Dec 2019 at 09:57, Michael Paquier <michael@paquier.xyz> wrote:
> > 
> >     On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
> >     > Concerning keeping PGPROC size as small as possible, I agree that it is
> >     > reasonable argument.
> >     > But even now it is very large (816 bytes) and adding extra 8 bytes will
> >     > increase it on less than 1%.
> > 
> >     It does not mean that we should add all kind of things to PGPROC as
> >     that's a structure sensitive enough already.

Well, but we don't keep other stats in PGPROC, even when we have them in
shared memory? It seems like PgBackendStatus or such might be a better
place?


> > Right. It's not as critical as PGXACT, but PGPROC is still significant for
> > scalability and connection count limits.
> > 
> > It's a shame we can't really keep most of it in backend-private memory and copy
> > it to requestors when they want it, say into a temporary DSA or over
> > a shm_mq.

I don't understand what that would buy? Commonly accessed field are just
going to be in L2 or such, with the cacheline being in
modified/exclusive state.  The problem isn't that fields / cachelines
*can* be accessed by other backends, it's only a problem *if* they're
frequently accessed. And even if accessed by multiple backends, it's
only really a problem if there are multiple fields *and* they're also
modified (otherwise they can just stay in shared stated across
cpus/sockets).

There *is* an argument for grouping fields in PGPROC by their access
patterns. E.g. something like ->procArrayGroup* is a lot more commonly
accessed by different backends than e.g. this proposed field.


> > But our single threaded model means we just cannot rely on backends being
> > responsive in a timely enough manner to supply data on-demand. That doesn't
> > mean we have to push it to PGPROC though: we could be sending the parts that
> > don't have to be super-fresh to the stats collector or a new related process
> > for active session stats and letting it aggregate them.

We should definitely *NOT* do that. Ferrying things through the stats
collector is really expensive, and everyone pays the price for an
increase in size, not just code accessing the field.  In fact, no
reasonable quantity that's known at server start should ever go through
a mechanism as expensive as pgstat - the only reason it exists is that
the number of tables obviously can grow over time.

There's a thread somewhere about a patchset to move all of pgstat into
dynamic shared memory, actually. Because the writes / reads needed by
pgstat are really bad on some systems.


> > That's way beyond the scope of this patch though. So personally I'm OK with the
> > new PGPROC field. Visibility into Pg's activity is woefully limited and
> > something we need to prioritize more.
> 
> Uh, how much does the new field get us near the CPU cache line max size
> for a single PGPROC entry?

It's like ~13x the common size of a cache line (64bytes).

Greetings,

Andres Freund