Обсуждение: Getting pg_stat_database data takes significant time

Поиск
Список
Период
Сортировка

Getting pg_stat_database data takes significant time

От
hubert depesz lubaczewski
Дата:
Hi,
We have servers where there is single app db, but one that contains MANY
schema/tables.

This is on Pg 12.6.

Simple query like: select * from pg_stat_database where datname = 'app_name' can take up to 800ms!

#v+
                                                                  QUERY PLAN
                     
 

══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 Subquery Scan on d  (cost=0.00..2.52 rows=2 width=216) (actual time=883.623..883.631 rows=1 loops=1)
   ->  Append  (cost=0.00..2.39 rows=2 width=68) (actual time=0.019..0.025 rows=1 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..0.02 rows=1 width=68) (actual time=0.003..0.003 rows=0
loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=68) (actual time=0.002..0.003 rows=0 loops=1)
                     One-Time Filter: NULL::boolean
         ->  Bitmap Heap Scan on pg_database  (cost=1.24..2.35 rows=1 width=68) (actual time=0.016..0.020 rows=1
loops=1)
               Recheck Cond: (datname = 'app_name'::name)
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on pg_database_datname_index  (cost=0.00..1.24 rows=1 width=0) (actual
time=0.009..0.010rows=1 loops=1)
 
                     Index Cond: (datname = 'app_name'::name)
 Planning Time: 0.294 ms
 Execution Time: 883.684 ms
(12 rows)
#v-

I checked and it looks that the problem is with pg_stat_get_db_* functions that are used in this view. For example:

#v+
=# explain (analyze on, buffers on) SELECT pg_stat_get_db_temp_bytes(7436115) AS temp_bytes;
                                       QUERY PLAN                                       
════════════════════════════════════════════════════════════════════════════════════════
 Result  (cost=0.00..0.01 rows=1 width=8) (actual time=465.152..465.153 rows=1 loops=1)
 Planning Time: 0.017 ms
 Execution Time: 465.175 ms
(3 rows)
#v-

Is there anything we could do to make it faster?

The problem is that on certain servers this query takes up to 10% of
total query time (as reported by pg_stat_statements).

This query is being called, quite a lot, by monitoring software, and
disabling it is not really an option. It is called every 15 seconds. So
not extremely often, but the total_time adds up "nicely".

Best regards,

depesz




Re: Getting pg_stat_database data takes significant time

От
Vijaykumar Jain
Дата:
 Just taking a shot, as I have seen in some previous issues? Ignore is not relevant.

Can you run vacuum on pg_class and  check the query again , or do you see pg_class bloated ?

The other option would be gdb backtrace I think that would help.

Re: Getting pg_stat_database data takes significant time

От
hubert depesz lubaczewski
Дата:
On Wed, Aug 11, 2021 at 06:52:15PM +0530, Vijaykumar Jain wrote:
>  Just taking a shot, as I have seen in some previous issues? Ignore is not
> relevant.
> 
> Can you run vacuum on pg_class and  check the query again , or do you see
> pg_class bloated ?

pg_class is large, but vacuuming it didn't help for time of query on
pg_stat_database.

vacuum output:
#v+
=# vacuum verbose analyze pg_class ;                                                        
INFO:  vacuuming "pg_catalog.pg_class"                           
INFO:  scanned index "pg_class_oid_index" to remove 3632 row versions

DETAIL:  CPU: user: 0.06 s, system: 0.00 s, elapsed: 0.06 s                                          
INFO:  scanned index "pg_class_relname_nsp_index" to remove 3632 row versions
DETAIL:  CPU: user: 0.16 s, system: 0.17 s, elapsed: 0.46 s                               
INFO:  scanned index "pg_class_tblspc_relfilenode_index" to remove 3632 row versions
DETAIL:  CPU: user: 0.08 s, system: 0.01 s, elapsed: 0.10 s

INFO:  "pg_class": removed 3632 row versions in 662 pages                                             
DETAIL:  CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s
INFO:  index "pg_class_oid_index" now contains 1596845 row versions in 11879 pages                               
DETAIL:  3632 index row versions were removed.   
852 index pages have been deleted, 835 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "pg_class_relname_nsp_index" now contains 1596845 row versions in 64591 pages
DETAIL:  3627 index row versions were removed.
588 index pages have been deleted, 574 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "pg_class_tblspc_relfilenode_index" now contains 1596845 row versions in 12389 pages
DETAIL:  3632 index row versions were removed.                                 
941 index pages have been deleted, 918 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_class": found 1226 removable, 59179 nonremovable row versions in 1731 out of 56171 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 1556677295
There were 42246 unused item identifiers.
Skipped 0 pages due to buffer pins, 13921 frozen pages.
0 pages are entirely empty.
CPU: user: 0.62 s, system: 0.19 s, elapsed: 0.94 s.
INFO:  analyzing "pg_catalog.pg_class"
INFO:  "pg_class": scanned 30000 of 56171 pages, containing 853331 live rows and 0 dead rows; 30000 rows in sample,
1597749estimated total rows
 
VACUUM
Time: 2687.170 ms (00:02.687)
#v-

> The other option would be gdb backtrace I think that would help.

backtrace from what? It doesn't *break*, it just takes strangely long time.

I could envision attaching gdb to pg process and getting backtrace, but when?
before running the query? after?

depesz



Re: Getting pg_stat_database data takes significant time

От
Vijaykumar Jain
Дата:
On Wed, 11 Aug 2021 at 18:59, hubert depesz lubaczewski <depesz@depesz.com> wrote:
On Wed, Aug 11, 2021 at 06:52:15PM +0530, Vijaykumar Jain wrote:
>  Just taking a shot, as I have seen in some previous issues? Ignore is not
> relevant.
>
> Can you run vacuum on pg_class and  check the query again , or do you see
> pg_class bloated ?

pg_class is large, but vacuuming it didn't help for time of query on
pg_stat_database.

ok my guess here was, since pg_class is updated every now and then with stats, it might require some lock while adding the data.
so if it were bloated, that would block the planner to get the estimates , and hence delay the query at whole.
but that was a wild guess.
--pg_class not locked
postgres@db:~$ time psql -c 'select * from pg_stat_database;' >/dev/null

real    0m0.016s
user    0m0.003s
sys     0m0.000s

-- pg_class locked and the query completed when tx it unlocks
postgres@db:~$ time psql -c 'select * from pg_stat_database;' >/dev/null

real    0m7.269s
user    0m0.003s
sys     0m0.000s
 
> The other option would be gdb backtrace I think that would help.

backtrace from what? It doesn't *break*, it just takes strangely long time.

I could envision attaching gdb to pg process and getting backtrace, but when?
before running the query? after?


backtrace of the running query, maybe you might have to run this multiple times against the raw query directly via psql, since you get this delay occasionally,
why i say this, -- 
the backtrace would probably show if it is hanging in a normal plan execution, or something else.
or maybe perf/strace to trace syscall timings  
but i know you know more than me :) , just asking if the backtrace helps expose something helpful.

--
Thanks,
Vijay
Mumbai, India

Re: Getting pg_stat_database data takes significant time

От
Vijaykumar Jain
Дата:
On Wed, 11 Aug 2021 at 19:12, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

ok my guess here was, since pg_class is updated every now and then with stats, it might require some lock while adding the data.
so if it were bloated, that would block the planner to get the estimates , and hence delay the query at whole.
but that was a wild guess.
 
--pg_class not locked
postgres@db:~$ time psql -c 'select * from pg_stat_database;' >/dev/null

real    0m0.016s
user    0m0.003s
sys     0m0.000s

-- pg_class locked and the query completed when tx it unlocks
postgres@db:~$ time psql -c 'select * from pg_stat_database;' >/dev/null

real    0m7.269s
user    0m0.003s
sys     0m0.000s


ok pls ignore, i think this was wrong.
the delay while pg_class was locked on psql connection, not pg_stat_database query.

strace -r psql -c 'select * from pg_stat_database;' >/dev/null
....
     0.000612 poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
     4.819832 recvfrom(3, "R\0\0\0\10\0\0\0\0S\0\0\0\32application_name\0p"..., 16384, 0, NULL, NULL) = 396 -- the time till table was locked

I think i'll back off here, do some gdb tracing myself before i make assumptions.

 

Re: Getting pg_stat_database data takes significant time

От
Alvaro Herrera
Дата:
Two things,

1. this depends on reading the stats file; that's done once per
transaction.  So if you run the query twice in a transaction, the second
time will take less time.  You can know how much time is spent reading
that file by subtracting both times.

2. EXPLAIN (VERBOSE) will tell you which functions are being called by
the query.  One of those loops across all live backends.  Is that
significant?  You could measure by creating an identical view but
omitting pg_stat_db_numbackends.  Does it take the same time as the
view?  If not, then you know that looping around all live backends is
slow.

If the problem is (1) then you could have less tables, so that the file
is smaller and thus faster to read, but I don't think you'll like that
answer; and if the problem is (2) then you could reduce max_connections,
but I don't think you'll like that either.

I suspect there's not much you can do, other than patch the monitoring
system to not read that view as often.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"



Re: Getting pg_stat_database data takes significant time

От
hubert depesz lubaczewski
Дата:
On Wed, Aug 11, 2021 at 10:16:13AM -0400, Alvaro Herrera wrote:
> 1. this depends on reading the stats file; that's done once per
> transaction.  So if you run the query twice in a transaction, the second
> time will take less time.  You can know how much time is spent reading
> that file by subtracting both times.

Yeah. I noticed. Looks like loading the stats file is the culprit. But
does that mean that the whole stats file has to be read at once? I just
need stats on db, not on relations?

> 2. EXPLAIN (VERBOSE) will tell you which functions are being called by
> the query.  One of those loops across all live backends.  Is that
> significant?  You could measure by creating an identical view but
> omitting pg_stat_db_numbackends.  Does it take the same time as the
> view?  If not, then you know that looping around all live backends is
> slow.

Even `select datid, datname, xact_commit, xact_rollback from pg_stat_database;`
takes (now), a second. Second call in the same connection, different txn, 0.8s.

Second call in the same transaction as first - 0.053ms.

So it definitely suggests that loading the stats file is the problem.

> If the problem is (1) then you could have less tables, so that the file
> is smaller and thus faster to read, but I don't think you'll like that
> answer; and if the problem is (2) then you could reduce max_connections,
> but I don't think you'll like that either.

max_connections would be related, if I understand correctly, if I had
faster starts without pg_stat_db_numbackends. And I don't - takes
basically the same time.

> I suspect there's not much you can do, other than patch the monitoring
> system to not read that view as often.

Once every 15 seconds doesn't seem to be too often, but perhaps I can do
something about it...

Best regards,

depesz




Re: Getting pg_stat_database data takes significant time

От
Magnus Hagander
Дата:
On Wed, Aug 11, 2021 at 6:34 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> On Wed, Aug 11, 2021 at 10:16:13AM -0400, Alvaro Herrera wrote:
> > 1. this depends on reading the stats file; that's done once per
> > transaction.  So if you run the query twice in a transaction, the second
> > time will take less time.  You can know how much time is spent reading
> > that file by subtracting both times.
>
> Yeah. I noticed. Looks like loading the stats file is the culprit. But
> does that mean that the whole stats file has to be read at once? I just
> need stats on db, not on relations?

The backend will read and cache the per database file on all those
calls for the current database, and it is read and cached as a whole,
along with global and shared stats.

Which database are you connected to? If you just want to look at the
global stats, it might help to be connected to a database that is
*not* the one with all the tables in -- e.g. connect to "postgres" and
query pg_stat_database looking for values on a different database? In
this case it would open files for "global", for "database postgres"
and "shared relations" only and skip the file for your db with many
objects. I think.


> > 2. EXPLAIN (VERBOSE) will tell you which functions are being called by
> > the query.  One of those loops across all live backends.  Is that
> > significant?  You could measure by creating an identical view but
> > omitting pg_stat_db_numbackends.  Does it take the same time as the
> > view?  If not, then you know that looping around all live backends is
> > slow.
>
> Even `select datid, datname, xact_commit, xact_rollback from pg_stat_database;`
> takes (now), a second. Second call in the same connection, different txn, 0.8s.
>
> Second call in the same transaction as first - 0.053ms.
>
> So it definitely suggests that loading the stats file is the problem.

Yes, definitely.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Getting pg_stat_database data takes significant time

От
Alvaro Herrera
Дата:
On 2021-Aug-11, hubert depesz lubaczewski wrote:

> On Wed, Aug 11, 2021 at 10:16:13AM -0400, Alvaro Herrera wrote:
> > 1. this depends on reading the stats file; that's done once per
> > transaction.  So if you run the query twice in a transaction, the second
> > time will take less time.  You can know how much time is spent reading
> > that file by subtracting both times.
> 
> Yeah. I noticed. Looks like loading the stats file is the culprit. But
> does that mean that the whole stats file has to be read at once? I just
> need stats on db, not on relations?

As I recall there is one file per database containing everything
pertaining to that database, and you cannot read it partially.

Maybe you can use stats_temp_directory to put these files in faster
or less busy storage -- a RAM disk perhaps?

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)



Re: Getting pg_stat_database data takes significant time

От
hubert depesz lubaczewski
Дата:
On Thu, Aug 12, 2021 at 09:08:27AM -0400, Alvaro Herrera wrote:
> On 2021-Aug-11, hubert depesz lubaczewski wrote:
> 
> > On Wed, Aug 11, 2021 at 10:16:13AM -0400, Alvaro Herrera wrote:
> > > 1. this depends on reading the stats file; that's done once per
> > > transaction.  So if you run the query twice in a transaction, the second
> > > time will take less time.  You can know how much time is spent reading
> > > that file by subtracting both times.
> > 
> > Yeah. I noticed. Looks like loading the stats file is the culprit. But
> > does that mean that the whole stats file has to be read at once? I just
> > need stats on db, not on relations?
> 
> As I recall there is one file per database containing everything
> pertaining to that database, and you cannot read it partially.
> 
> Maybe you can use stats_temp_directory to put these files in faster
> or less busy storage -- a RAM disk perhaps?

The fie is 120MB, and is stored in tmpfs, which I assume, on Linux, is
ram disk.

depesz



Re: Getting pg_stat_database data takes significant time

От
hubert depesz lubaczewski
Дата:
On Thu, Aug 12, 2021 at 11:32:15AM +0200, Magnus Hagander wrote:
> Which database are you connected to? If you just want to look at the
> global stats, it might help to be connected to a database that is
> *not* the one with all the tables in -- e.g. connect to "postgres" and
> query pg_stat_database looking for values on a different database? In
> this case it would open files for "global", for "database postgres"
> and "shared relations" only and skip the file for your db with many
> objects. I think.

I'm connected to the db I need data about, and I need data from virtualy
all columns of pg_stat_database.

depesz



Re: Getting pg_stat_database data takes significant time

От
Magnus Hagander
Дата:
On Thu, Aug 12, 2021 at 4:38 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> On Thu, Aug 12, 2021 at 11:32:15AM +0200, Magnus Hagander wrote:
> > Which database are you connected to? If you just want to look at the
> > global stats, it might help to be connected to a database that is
> > *not* the one with all the tables in -- e.g. connect to "postgres" and
> > query pg_stat_database looking for values on a different database? In
> > this case it would open files for "global", for "database postgres"
> > and "shared relations" only and skip the file for your db with many
> > objects. I think.
>
> I'm connected to the db I need data about, and I need data from virtualy
> all columns of pg_stat_database.

Try connecting to a different database, while still querying all columns.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Getting pg_stat_database data takes significant time

От
hubert depesz lubaczewski
Дата:
On Thu, Aug 12, 2021 at 06:20:23PM +0200, Magnus Hagander wrote:
> On Thu, Aug 12, 2021 at 4:38 PM hubert depesz lubaczewski
> <depesz@depesz.com> wrote:
> >
> > On Thu, Aug 12, 2021 at 11:32:15AM +0200, Magnus Hagander wrote:
> > > Which database are you connected to? If you just want to look at the
> > > global stats, it might help to be connected to a database that is
> > > *not* the one with all the tables in -- e.g. connect to "postgres" and
> > > query pg_stat_database looking for values on a different database? In
> > > this case it would open files for "global", for "database postgres"
> > > and "shared relations" only and skip the file for your db with many
> > > objects. I think.
> >
> > I'm connected to the db I need data about, and I need data from virtualy
> > all columns of pg_stat_database.
> Try connecting to a different database, while still querying all columns.

Damn,  this is huge difference:
=$ time psql -d postgres -c "select * from pg_stat_database where datname = 'appdb'" -qAtX > /dev/null

real    0m0.058s
user    0m0.026s
sys     0m0.013s

=$ time psql -d appdb -c "select * from pg_stat_database where datname = 'appdb'" -qAtX > /dev/null

real    0m0.466s
user    0m0.031s
sys     0m0.008s

Not sure if I can make the monioring software use different connection for this
one particular query, though. Will check with them. Thanks.

Best regards,

depesz