Обсуждение: select count(id) on RDS replica causing high CPU load on RDS master

Поиск
Список
Период
Сортировка

select count(id) on RDS replica causing high CPU load on RDS master

От
Azul
Дата:
Hi sweeties,

I'm a bit confused about this one, and could use some help from you PG overlords,

so I have a RDS 9.5 master 16vcpu/64GB RAM/5000 IOPS which is RDS replicating to a replica with the same specs.
The load on the master is roughly 20/30% cpu and there's only some batch queries running on the replica usually in the morning.

Now this master DB was neglected for a long time, and we had tables with 1.3 billion records. I ran an outbound daemon that was deleting chunks of this table over a period of weeks, and it is now cleaned up. Autovacuum on these amounts of deletes is a beast on its own, and right now is running what I hope is the last long running one.

So I had been trying for a couple of days to get an accurate count of the number of records remaining on that large table that still needed to be deleted,

select count(id) from large_table where id < 99999;

I am running the above on the replica to avoid causing an extra load on the master, that query takes a long time (lets ignore the fact that it badly needs an analyse to finish), roughly an hour or so.
Now what is baffling me is the CPU load on the master goes  up steadily  all the way to 100% while this select count is running on the slave. Worth mentioning that CPU on the slave increases by about 10% of so.

I have google, read the postgres parameters, dug into AWS forums, stack overflows and I'm none the wiser about this one.

How can a select count like this one on the replica cause CPU to spike on the master?

thanks for your help postgres overlords
your humble pebble 
- azul


Re: select count(id) on RDS replica causing high CPU load on RDSmaster

От
Jeremy Schneider
Дата:
On 6/3/20 04:04, Azul wrote:
> so I have a RDS 9.5 master 16vcpu/64GB RAM/5000 IOPS which is RDS
> replicating to a replica with the same specs.

If you're not already making plans to upgrade this, many people would
strongly urge you start now. The final release for 9.5 is less than a
year away, after which you will no longer be able to get security
updates or bug fixes.

https://www.postgresql.org/support/versioning/

Please consider moving in the direction of v12.

Just for posterity in case someone googles this email thread in the
future, an RDS "read replica" is a "hot standby" in PostgreSQL terminology.

https://www.postgresql.org/docs/9.5/high-availability.html


> I am running the above on the replica to avoid causing an extra load on
> the master, that query takes a long time (lets ignore the fact that it
> badly needs an analyse to finish), roughly an hour or so.

Just curious, did you happen to enable the hot_standby_feedback
parameter? Looks to me like it's off by default in 9.4.


> Now what is baffling me is the CPU load on the master goes  up steadily 
> all the way to 100% while this select count is running on the slave.
> Worth mentioning that CPU on the slave increases by about 10% of so.


Which PIDs/processes are using the CPU? (Vacuum? App connections running
queries?) For people who manage PostgreSQL themselves, they'd use normal
unix utilities like top, ps, etc. On RDS you want to enabled "Enhanced
Monitoring" and check the "process list". (In the web console, as of
last time I checked: go to the database, choose the "Monitoring" tab,
click the dropdown box at the top right that says "Monitoring" and
choose "OS Process List".)

If you see that it's user sessions, then you can connect with psql at
the same time as the high activity and query the contents of
pg_stat_activity to get a little more information about what the
particular process is doing.


-- 
Jeremy Schneider
Database Engineer
Amazon Web Services



Re: select count(id) on RDS replica causing high CPU load on RDS master

От
Azul
Дата:

Thanks Jeremy,

Plan is on the cards to upgrade to 10 this or next week, and then later on towards v12.
hot_standby_feedback is enabled, yes.
So the process list only shows high cpu usage for the postgres process itself, and all other ones are < 1.5%.
it's a bit baffling as I don't have anything heavy running on the master and all the heavy reads have been directed to the standby.
As soon as a large query hits the standby the load goes up in the master, but there's no correlation with IO memory or anything on the master, its just cpu load.

- azul

On Wed, 3 Jun 2020 at 14:59, Jeremy Schneider <schnjere@amazon.com> wrote:
On 6/3/20 04:04, Azul wrote:
> so I have a RDS 9.5 master 16vcpu/64GB RAM/5000 IOPS which is RDS
> replicating to a replica with the same specs.

If you're not already making plans to upgrade this, many people would
strongly urge you start now. The final release for 9.5 is less than a
year away, after which you will no longer be able to get security
updates or bug fixes.

https://www.postgresql.org/support/versioning/

Please consider moving in the direction of v12.

Just for posterity in case someone googles this email thread in the
future, an RDS "read replica" is a "hot standby" in PostgreSQL terminology.

https://www.postgresql.org/docs/9.5/high-availability.html


> I am running the above on the replica to avoid causing an extra load on
> the master, that query takes a long time (lets ignore the fact that it
> badly needs an analyse to finish), roughly an hour or so.

Just curious, did you happen to enable the hot_standby_feedback
parameter? Looks to me like it's off by default in 9.4.


> Now what is baffling me is the CPU load on the master goes  up steadily 
> all the way to 100% while this select count is running on the slave.
> Worth mentioning that CPU on the slave increases by about 10% of so.


Which PIDs/processes are using the CPU? (Vacuum? App connections running
queries?) For people who manage PostgreSQL themselves, they'd use normal
unix utilities like top, ps, etc. On RDS you want to enabled "Enhanced
Monitoring" and check the "process list". (In the web console, as of
last time I checked: go to the database, choose the "Monitoring" tab,
click the dropdown box at the top right that says "Monitoring" and
choose "OS Process List".)

If you see that it's user sessions, then you can connect with psql at
the same time as the high activity and query the contents of
pg_stat_activity to get a little more information about what the
particular process is doing.


--
Jeremy Schneider
Database Engineer
Amazon Web Services