Обсуждение: MultiXactId wraparound and last aggressive vacuum time

Поиск
Список
Период
Сортировка

MultiXactId wraparound and last aggressive vacuum time

От
Michael Schanne
Дата:
Hi,

My application is getting the following exception:

InternalError: (psycopg2.InternalError) MultiXactId 808263738 has not been created yet -- apparent wraparound

I read over https://www.postgresql.org/docs/9.6/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND and I suspect the issue was caused by not vacuuming often enough.  I can query the last_autovacuum time from the pg_stat_all_tables view.  However, I found this in the documentation:

VACUUM uses the visibility map to determine which pages of a table must be scanned. Normally, it will skip pages that don't have any dead row versions even if those pages might still have row versions with old XID values. Therefore, normal VACUUMs won't always freeze every old row version in the table. Periodically, VACUUM will perform an aggressive vacuum, skipping only those pages which contain neither dead rows nor any unfrozen XID or MXID values. vacuum_freeze_table_age controls when VACUUM does that: all-visible but not all-frozen pages are scanned if the number of transactions that have passed since the last such scan is greater than vacuum_freeze_table_age minus vacuum_freeze_min_age. Setting vacuum_freeze_table_age to 0 forces VACUUM to use this more aggressive strategy for all scans.

How can I tell when the last "aggressive" auto-vacuum was performed?  The vacuum_freeze_table_age parameter is at the default value (200000000).

I am using postgresql 9.6.

Thanks,
Mike

Re: MultiXactId wraparound and last aggressive vacuum time

От
Michael Lewis
Дата:
I don't believe you can determine a date/time that it happened, but querying the age of the table is simple to do and then you can compare that with the freeze age parameters. A periodic manual vacuum freeze may preclude the system ever needing to perform the emergency autovacuum freeze, and as an added benefit, index only scans would be more likely to be chosen since the visibility map would be updated and the regular autovacuum runs would be faster since they can skip frozen pages. I think that applies back to 9.6 at least. It's possible it got implemented in 10.

My caffeine hasn't fully kicked in yet and I am still a bit junior on this list, but I would hope and expect senior people to correct me if I have misunderstood or misrepresented things.

Re: MultiXactId wraparound and last aggressive vacuum time

От
Michael Schanne
Дата:
After a closer reading of the documentation, it appears the database should stop accepting new transactions before a wraparound would ever occur.  If so, then the only possible explanations for this multixactid wraparound error would be data corruption, or a bug in postgresql.  The exact version I'm using is 9.6.10, which is quite a few versions behind the latest (9.6.21), but I skimmed through the release notes of the later versions and did not see any bugfixes in this area.  That would leave data corruption as the only explanation.  Is my reasoning correct here?  I'm willing to upgrade but I would need to justify it somehow, so if I am missing something please let me know.

On Tue, Mar 23, 2021 at 11:54 AM Michael Lewis <mlewis@entrata.com> wrote:
I don't believe you can determine a date/time that it happened, but querying the age of the table is simple to do and then you can compare that with the freeze age parameters. A periodic manual vacuum freeze may preclude the system ever needing to perform the emergency autovacuum freeze, and as an added benefit, index only scans would be more likely to be chosen since the visibility map would be updated and the regular autovacuum runs would be faster since they can skip frozen pages. I think that applies back to 9.6 at least. It's possible it got implemented in 10.

My caffeine hasn't fully kicked in yet and I am still a bit junior on this list, but I would hope and expect senior people to correct me if I have misunderstood or misrepresented things.

Re: MultiXactId wraparound and last aggressive vacuum time

От
Jehan-Guillaume de Rorthais
Дата:
On Mon, 5 Apr 2021 17:28:06 -0400
Michael Schanne <michael.schanne@gmail.com> wrote:

> After a closer reading of the documentation, it appears the database should
> stop accepting new transactions before a wraparound would ever occur.  If
> so, then the only possible explanations for this multixactid wraparound
> error would be data corruption, or a bug in postgresql.  The exact version
> I'm using is 9.6.10, which is quite a few versions behind the latest
> (9.6.21), but I skimmed through the release notes of the later versions and
> did not see any bugfixes in this area.  That would leave data corruption as
> the only explanation.  Is my reasoning correct here?

I didn't checked the changelog, and you should definitely run 9.6.21, but I
believe your reasoning is correct anyway. A bug might be possible, but I would
bet a coin on the corruption.

You might want to compare this number with the value reported by: 

  pg_controldata $PGDATA|grep NextMultiXactId

Backup your cluster, then, try to isolate the table(s) and block(s) where the
corruption occurs and check at them using eg. pageinspect.

> I'm willing to upgrade but I would need to justify it somehow, so if I am
> missing something please let me know.

you can justify the upgrade using this load of reasons:
https://why-upgrade.depesz.com/show?from=9.6.10&to=9.6.21

Regards,