Thanks for the feedva k,
It turns out we were not looking at master & skave. The slave has no replication slot. We are seeing lots of row versions not getting cleaned up, trying toinvestigate all the possible causes. Is there a way to determine which connections (on the master or the slave) are the cause of the row versions not getting cleaned up?
Thanks
On Wed, Jul 17, 2019, at 4:32 PM, Andres Freund wrote:
Hi,
On 2019-07-17 12:31:29 -0600, S. Bob wrote:
> We have a master and a standby (Streaming replication).
>
> This query produces a value of approx 1.5million on the master and approx
> 80million on the standby:
>
>
> select slot_name, slot_type, database, xmin from pg_replication_slots order
> by age(xmin);
I'd suggest using age(xmin) in the output as well, not just in the ORDER
BY, otherwise it's hard to make sense of the value.
When you say a value of '1.5 million' on the master, how did you
determine that? Because the above doesn't return anything for the
master. The simplest way to determine that is probably:
SELECT max(age(backend_xmin)) FROM pg_stat_activity;
When you say '80 million', do you mean that pg_replication_slots.xmin
was 80 million, or age(pg_replication_slots.xmin)? If it's the former,
then that would indicate that the standby is *not* a problem, but that
you have a long-running transaction or such on the primary.
Basically, age(xmin) has to be large, not xmin itself.
> We are running replication in async mode.
>
>
> A couple of questions:
>
>
> - Is it unusual for the master & standby to differ dramatically like this?
Do you have very longrunning queries on the standby or primary? That's
typically the reason why xmin is old.
> - could the replication slot (i.e. the xmin on the standby) be preventing
> vacuum from cleaning up row versions on the master?
Yes.
Greetings,
Andres Freund