Обсуждение: MultiXactId error after upgrade to 9.3.4
Greetings, Looks like we might not be entirely out of the woods yet regarding MultiXactId's. After doing an upgrade from 9.2.6 to9.3.4, we saw the following: ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound The table contents can be select'd out and match the pre-upgrade backup, but any attempt to VACUUM / VACUUM FULL / CLUSTERfails, unsurprisingly. I've just started looking into this, but here's a bit more data- The *NEW* (9.3.4) cluster's pg_multixact files: pg_multixact/members/ -rw------- 1 postgres postgres 8192 Mar 30 02:33 0000 pg_multixact/offsets/ -rw------- 1 postgres postgres 8192 Mar 28 01:11 0000 -rw------- 1 postgres postgres 122880 Mar 30 02:33 0018 The *OLD* (9.2.6) cluster's pg_multixact files: pg_multixact/members/ -rw-rw-r-- 1 postgres postgres 188416 Mar 30 03:07 0044 pg_multixact/offsets/ -rw-rw-r-- 1 postgres postgres 114688 Mar 30 03:07 0018 txid_current - 40297693 datfrozenxid - 654 autovacuum_freeze_max_age, vacuum_freeze_min_age, vacuum_freeze_table_age, multixact GUCs are commented out / default values. The *NEW* (9.3.4) control data shows: pg_control version number: 937 Catalog version number: 201306121 Latest checkpoint's NextXID: 0/40297704 Latest checkpoint's NextOID: 53773598 Latest checkpoint's NextMultiXactId: 1601771 Latest checkpoint's NextMultiOffset: 1105 Latest checkpoint's oldestXID: 654 Latest checkpoint's oldestXID's DB: 12036 Latest checkpoint's oldestActiveXID: 40297704 Latest checkpoint's oldestMultiXid: 1601462 Latest checkpoint's oldestMulti's DB: 0 The *OLD* (9.2.6) control data had: pg_control version number: 922 Catalog version number: 201204301 Latest checkpoint's NextXID: 0/40195831 Latest checkpoint's NextOID: 53757891 Latest checkpoint's NextMultiXactId: 1601462 Latest checkpoint's NextMultiOffset: 4503112 Latest checkpoint's oldestXID: 654 Latest checkpoint's oldestXID's DB: 12870 Latest checkpoint's oldestActiveXID: 0 (It doesn't report the oldestMulti info under 9.2.6). The pg_upgrade reported: Setting oldest multixact ID on new cluster While testing, I discovered that this didn't appear to happen with a (earlier) upgrade to 9.3.2. Don't know if it's indicativeof anything. Here is what pageinspace shows for the record: -[ RECORD 1 ]-------- lp | 45 lp_off | 5896 lp_flags | 1 lp_len | 50 t_xmin | 9663920 t_xmax | 6849409 t_field3 | 26930 t_ctid | (0,45) t_infomask2 | 5 t_infomask | 6528 t_hoff | 24 t_bits | t_oid | Which shows xmax as 6849409 and HEAP_XMAX_IS_MULTI is set. This is the only tuple in the table which has HEAP_XMAX_IS_MULTIand the xmax for all of the other tuples is quite a bit higher (though all are visible currently). Thoughts? Thanks, Stephen
All, * Stephen Frost (sfrost@snowman.net) wrote: > Looks like we might not be entirely out of the woods yet regarding > MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the > following: > > ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound While trying to get the production system back in order, I was able to simply do: select * from table for update; Which happily updated the xmax for all of the rows- evidently without any care that the MultiXactId in one of the tuples was considered invalid (by at least some parts of the code). I have the pre-upgrade database and can upgrade/rollback/etc that pretty easily. Note that the table contents weren't changed during the upgrade, of course, and so the 9.2.6 instance has HEAP_XMAX_IS_MULTI set while t_xmax is 6849409 for the tuple in question- even though pg_controldata reports NextMultiXactId as 1601462 (and it seems very unlikely that there's been a wraparound on that in this database..). Perhaps something screwed up xmax/HEAP_XMAX_IS_MULTI under 9.2 and the 9.3 instance now detects that something is wrong? Or is this a case which was previously allowed and it's just in 9.3 that we don't like it? Hard for me to see why that would be the case, but this really feels like HEAP_XMAX_IS_MULTI was incorrectly set on the old cluster and the xmax in the table was actually a regular xid.. That would have come from 9.2 though. Thanks, Stephen
* Stephen Frost (sfrost@snowman.net) wrote: > I have the pre-upgrade database and can upgrade/rollback/etc that pretty > easily. Note that the table contents weren't changed during the > upgrade, of course, and so the 9.2.6 instance has HEAP_XMAX_IS_MULTI set > while t_xmax is 6849409 for the tuple in question- even though > pg_controldata reports NextMultiXactId as 1601462 (and it seems very > unlikely that there's been a wraparound on that in this database..). Further review leads me to notice that both HEAP_XMAX_IS_MULTI and HEAP_XMAX_INVALID are set: t_infomask | 6528 6528 decimal -> 0x1980 0001 1001 1000 0000 Which gives us: 0000 0000 1000 0000 - HEAP_XMAX_LOCK_ONLY 0000 0001 0000 0000 - HEAP_XMIN_COMMITTED 0000 1000 0000 0000 - HEAP_XMAX_INVALID 0001 0000 0000 0000 - HEAP_XMAX_IS_MULTI Which shows that both HEAP_XMAX_INVALID and HEAP_XMAX_IS_MULTI are set. Of some interest is that HEAP_XMAX_LOCK_ONLY is also set.. > Perhaps something screwed up xmax/HEAP_XMAX_IS_MULTI under 9.2 and the > 9.3 instance now detects that something is wrong? Or is this a case > which was previously allowed and it's just in 9.3 that we don't like it? The 'improve concurrency of FK locking' patch included a change to UpdateXmaxHintBits(): - * [...] Hence callers should look - * only at XMAX_INVALID. ... + * Hence callers should look only at XMAX_INVALID. + * + * Note this is not allowed for tuples whose xmax is a multixact. [...] + Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI)); What isn't clear to me is if this restriction was supposed to always be there and something pre-9.3 screwed this up, or if this is a *new* restriction on what's allowed, in which case it's an on-disk format change that needs to be accounted for. One other thing to mention is that this system was originally a 9.0 system and the last update to this tuple that we believe happened was when it was on 9.0, prior to the 9.2 upgrade (which happened about a year ago), so it's possible the issue is from the 9.0 era. Thanks, Stephen
Hi, On 2014-03-30 00:00:30 -0400, Stephen Frost wrote: > Greetings, > > Looks like we might not be entirely out of the woods yet regarding > MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the > following: > > ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound > > The table contents can be select'd out and match the pre-upgrade > backup, but any attempt to VACUUM / VACUUM FULL / CLUSTER fails, > unsurprisingly. Without having looked at the code, IIRC this looks like some place misses passing allow_old=true where it's actually required. Any chance you can get a backtrace for the error message? I know you said somewhere below that you'd worked around the problem, but maybe you have a copy of the database somewhere? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Stephen Frost wrote: > * Stephen Frost (sfrost@snowman.net) wrote: > > I have the pre-upgrade database and can upgrade/rollback/etc that pretty > > easily. Note that the table contents weren't changed during the > > upgrade, of course, and so the 9.2.6 instance has HEAP_XMAX_IS_MULTI set > > while t_xmax is 6849409 for the tuple in question- even though > > pg_controldata reports NextMultiXactId as 1601462 (and it seems very > > unlikely that there's been a wraparound on that in this database..). > > Further review leads me to notice that both HEAP_XMAX_IS_MULTI and > HEAP_XMAX_INVALID are set: > > t_infomask | 6528 > > 6528 decimal -> 0x1980 > > 0001 1001 1000 0000 > > Which gives us: > > 0000 0000 1000 0000 - HEAP_XMAX_LOCK_ONLY > 0000 0001 0000 0000 - HEAP_XMIN_COMMITTED > 0000 1000 0000 0000 - HEAP_XMAX_INVALID > 0001 0000 0000 0000 - HEAP_XMAX_IS_MULTI > > Which shows that both HEAP_XMAX_INVALID and HEAP_XMAX_IS_MULTI are set. My conclusion here is that some part of the code is failing to examine XMAX_INVALID before looking at the value stored in xmax itself. There ought to be a short-circuit. Fortunately, this bug should be pretty harmless. .. and after looking, I'm fairly sure the bug is in heap_tuple_needs_freeze. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > My conclusion here is that some part of the code is failing to examine > XMAX_INVALID before looking at the value stored in xmax itself. There > ought to be a short-circuit. Fortunately, this bug should be pretty > harmless. > > .. and after looking, I'm fairly sure the bug is in > heap_tuple_needs_freeze. heap_tuple_needs_freeze() isn't *allowed* to look at XMAX_INVALID. Otherwise it could miss freezing something still visible on a standby or after an eventual crash. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > My conclusion here is that some part of the code is failing to examine > > XMAX_INVALID before looking at the value stored in xmax itself. There > > ought to be a short-circuit. Fortunately, this bug should be pretty > > harmless. > > > > .. and after looking, I'm fairly sure the bug is in > > heap_tuple_needs_freeze. > > heap_tuple_needs_freeze() isn't *allowed* to look at > XMAX_INVALID. Otherwise it could miss freezing something still visible > on a standby or after an eventual crash. Ah, you're right. It even says so on the comment at the top (no caffeine yet.) But what it's doing is still buggy, per this report, so we need to do *something* ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-03-31 09:19:12 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > > My conclusion here is that some part of the code is failing to examine > > > XMAX_INVALID before looking at the value stored in xmax itself. There > > > ought to be a short-circuit. Fortunately, this bug should be pretty > > > harmless. > > > > > > .. and after looking, I'm fairly sure the bug is in > > > heap_tuple_needs_freeze. > > > > heap_tuple_needs_freeze() isn't *allowed* to look at > > XMAX_INVALID. Otherwise it could miss freezing something still visible > > on a standby or after an eventual crash. > > Ah, you're right. It even says so on the comment at the top (no > caffeine yet.) But what it's doing is still buggy, per this report, so > we need to do *something* ... Are you sure needs_freeze() is the problem here? IIRC it already does some checks for allow_old? Why is the check for that not sufficient? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2014-03-31 09:19:12 -0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > > > My conclusion here is that some part of the code is failing to examine > > > > XMAX_INVALID before looking at the value stored in xmax itself. There > > > > ought to be a short-circuit. Fortunately, this bug should be pretty > > > > harmless. > > > > > > > > .. and after looking, I'm fairly sure the bug is in > > > > heap_tuple_needs_freeze. > > > > > > heap_tuple_needs_freeze() isn't *allowed* to look at > > > XMAX_INVALID. Otherwise it could miss freezing something still visible > > > on a standby or after an eventual crash. > > > > Ah, you're right. It even says so on the comment at the top (no > > caffeine yet.) But what it's doing is still buggy, per this report, so > > we need to do *something* ... > > Are you sure needs_freeze() is the problem here? > > IIRC it already does some checks for allow_old? Why is the check for > that not sufficient? GetMultiXactIdMembers has this: if (MultiXactIdPrecedes(multi, oldestMXact)){ ereport(allow_old ? DEBUG1 : ERROR, (errcode(ERRCODE_INTERNAL_ERROR), errmsg("MultiXactId %u does no longer exist -- apparent wraparound", multi))); return -1;} if (!MultiXactIdPrecedes(multi, nextMXact)) ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), errmsg("MultiXactId%u has not been created yet -- apparent wraparound", multi))); I guess I wasn't expecting that too-old values would last longer than a full wraparound cycle. Maybe the right fix is just to have the second check also conditional on allow_old. Anyway, it's not clear to me why this database has a multixact value of 6 million when the next multixact value is barely above one million. Stephen said a wraparound here is not likely. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > I guess I wasn't expecting that too-old values would last longer than a > full wraparound cycle. Maybe the right fix is just to have the second > check also conditional on allow_old. I don't believe this was a wraparound case. > Anyway, it's not clear to me why this database has a multixact value of > 6 million when the next multixact value is barely above one million. > Stephen said a wraparound here is not likely. I don't think the xmax value is a multixact at all- I think it's actually a regular xid, but everyone is expected to ignore it because XMAX_IS_INVALID, yet somehow the MULTIXACT bit was also set and the new code expects to be able to look at the xmax field even though it's marked as invalid.. I'm going through the upgrade process again from 9.2 and will get a stack trace. Thanks, Stephen
On 2014-03-31 09:09:08 -0400, Stephen Frost wrote: > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > I guess I wasn't expecting that too-old values would last longer than a > > full wraparound cycle. Maybe the right fix is just to have the second > > check also conditional on allow_old. > > I don't believe this was a wraparound case. Could you show pg_controldata from the old cluster? > > Anyway, it's not clear to me why this database has a multixact value of > > 6 million when the next multixact value is barely above one million. > > Stephen said a wraparound here is not likely. > > I don't think the xmax value is a multixact at all- I think it's > actually a regular xid, but everyone is expected to ignore it because > XMAX_IS_INVALID, yet somehow the MULTIXACT bit was also set and the new > code expects to be able to look at the xmax field even though it's > marked as invalid.. XMAX_IS_INVALID was never allowed to be ignored for vacuum AFAICS. So I don't think this is a valid explanation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres, * Andres Freund (andres@2ndquadrant.com) wrote: > Without having looked at the code, IIRC this looks like some place > misses passing allow_old=true where it's actually required. Any chance > you can get a backtrace for the error message? I know you said somewhere > below that you'd worked around the problem, but maybe you have a copy of > the database somewhere? #0 GetMultiXactIdMembers (allow_old=<optimized out>, members=0x7fff78200ca0, multi=6849409) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/transam/multixact.c:1130 #1 GetMultiXactIdMembers (multi=6849409, members=0x7fff78200ca0, allow_old=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/transam/multixact.c:1056 #2 0x00007f46e9e707f8 in FreezeMultiXactId (flags=<synthetic pointer>, cutoff_multi=<optimized out>, cutoff_xid=4285178915,t_infomask=<optimized out>, multi=6849409) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/heap/heapam.c:5355 #3 heap_prepare_freeze_tuple (tuple=0x7f46e2d9a328, cutoff_xid=4285178915, cutoff_multi=<optimized out>, frz=0x7f46ec4d31b0)at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/heap/heapam.c:5593 #4 0x00007f46e9f72aca in lazy_scan_heap (scan_all=0 '\000', nindexes=2, Irel=<optimized out>, vacrelstats=<optimized out>,onerel=0x7f46e9d65ca0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuumlazy.c:899 #5 lazy_vacuum_rel (onerel=0x7f46e9d65ca0, vacstmt=<optimized out>, bstrategy=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuumlazy.c:236 #6 0x00007f46e9f70755 in vacuum_rel (relid=288284, vacstmt=0x7f46ec59f780, do_toast=1 '\001', for_wraparound=0 '\000') at/tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuum.c:1205 #7 0x00007f46e9f71445 in vacuum (vacstmt=0x7f46ec59f780, relid=<optimized out>, do_toast=1 '\001', bstrategy=<optimizedout>, for_wraparound=0 '\000', isTopLevel=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuum.c:234 #8 0x00007f46ea05e773 in standard_ProcessUtility (parsetree=0x7f46ec59f780, queryString=0x7f46ec59ec90 "vacuum common.wave_supplemental;",context=<optimized out>, params=0x0, dest=<optimized out>, completionTag=0x7fff782016e0 "") at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/utility.c:639 #9 0x00007f46ea05b5a3 in PortalRunUtility (portal=0x7f46ec4f82e0, utilityStmt=0x7f46ec59f780, isTopLevel=1 '\001', dest=0x7f46ec59fae0,completionTag=0x7fff782016e0 "") at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/pquery.c:1187 #10 0x00007f46ea05c1d6 in PortalRunMulti (portal=0x7f46ec4f82e0, isTopLevel=1 '\001', dest=0x7f46ec59fae0, altdest=0x7f46ec59fae0,completionTag=0x7fff782016e0 "") at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/pquery.c:1318 #11 0x00007f46ea05ce6d in PortalRun (portal=0x7f46ec4f82e0, count=9223372036854775807, isTopLevel=1 '\001', dest=0x7f46ec59fae0,altdest=0x7f46ec59fae0, completionTag=0x7fff782016e0 "") at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/pquery.c:816 #12 0x00007f46ea05908c in exec_simple_query (query_string=0x7f46ec59ec90 "vacuum common.wave_supplemental;") at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/postgres.c:1048 #13 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=0x7f46ec4b9010 "mine", username=<optimized out>) at/tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/postgres.c:3997 #14 0x00007f46ea01487d in BackendRun (port=0x7f46ec4fc2c0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:3996 #15 BackendStartup (port=0x7f46ec4fc2c0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:3685 #16 ServerLoop () at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:1586 #17 PostmasterMain (argc=<optimized out>, argv=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:1253 #18 0x00007f46e9e4950c in main (argc=5, argv=0x7f46ec4b81f0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/main/main.c:206 Looks like your idea that is has to do w/ freezeing is accurate... Thanks, Stephen
* Andres Freund (andres@2ndquadrant.com) wrote: > On 2014-03-31 09:09:08 -0400, Stephen Frost wrote: > > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > > I guess I wasn't expecting that too-old values would last longer than a > > > full wraparound cycle. Maybe the right fix is just to have the second > > > check also conditional on allow_old. > > > > I don't believe this was a wraparound case. > > Could you show pg_controldata from the old cluster? Per the original email- The *OLD* (9.2.6) control data had: pg_control version number: 922 Catalog version number: 201204301 Latest checkpoint's NextXID: 0/40195831 Latest checkpoint's NextOID: 53757891 Latest checkpoint's NextMultiXactId: 1601462 Latest checkpoint's NextMultiOffset: 4503112 Latest checkpoint's oldestXID: 654 Latest checkpoint's oldestXID's DB: 12870 Latest checkpoint's oldestActiveXID: 0 (It doesn't report the oldestMulti info under 9.2.6). Was there something else you were looking for? > > I don't think the xmax value is a multixact at all- I think it's > > actually a regular xid, but everyone is expected to ignore it because > > XMAX_IS_INVALID, yet somehow the MULTIXACT bit was also set and the new > > code expects to be able to look at the xmax field even though it's > > marked as invalid.. > > XMAX_IS_INVALID was never allowed to be ignored for vacuum AFAICS. So I > don't think this is a valid explanation. The old 9.2 cluster certainly had no issue w/ vacuum'ing this table/tuple. Unfortunately, I can't have the 9.2 debug packages installed concurrently w/ the 9.3 debug packages, so it's a bit awkward to go back and forth between the two. Anything else of interest while I have the 9.3 instance running under gdb? Sent the requested backtrace in another email. Thanks, Stephen
Stephen Frost wrote: > Further review leads me to notice that both HEAP_XMAX_IS_MULTI and > HEAP_XMAX_INVALID are set: > > t_infomask | 6528 > > 6528 decimal -> 0x1980 > > 0001 1001 1000 0000 > > Which gives us: > > 0000 0000 1000 0000 - HEAP_XMAX_LOCK_ONLY > 0000 0001 0000 0000 - HEAP_XMIN_COMMITTED > 0000 1000 0000 0000 - HEAP_XMAX_INVALID > 0001 0000 0000 0000 - HEAP_XMAX_IS_MULTI > > Which shows that both HEAP_XMAX_INVALID and HEAP_XMAX_IS_MULTI are set. > Of some interest is that HEAP_XMAX_LOCK_ONLY is also set.. This combination seems reasonable. This tuple had two FOR SHARE lockers, so it was marked HEAP_XMAX_SHARED_LOCK|HEAP_XMAX_IS_MULTI (0x1080). Then those lockers finished, and somebody else checked the tuple with a tqual.c routine (say HeapTupleSatisfiesUpdate), which saw the lockers were gone and marked it as HEAP_XMAX_INVALID (0x800), without removing the Xmax value and without removing the other bits. This is all per spec, so we must cope. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > My conclusion here is that some part of the code is failing to examine > > XMAX_INVALID before looking at the value stored in xmax itself. There > > ought to be a short-circuit. Fortunately, this bug should be pretty > > harmless. > > > > .. and after looking, I'm fairly sure the bug is in > > heap_tuple_needs_freeze. > > heap_tuple_needs_freeze() isn't *allowed* to look at > XMAX_INVALID. Otherwise it could miss freezing something still visible > on a standby or after an eventual crash. I think this rule is wrong. I think the rule ought to be something like "if the XMAX_INVALID bit is set, then reset whatever is there if there is something; if the bit is not set, proceed as today". Otherwise we risk reading garbage, which is what is happening in this case. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera wrote: > Andres Freund wrote: > > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > > My conclusion here is that some part of the code is failing to examine > > > XMAX_INVALID before looking at the value stored in xmax itself. There > > > ought to be a short-circuit. Fortunately, this bug should be pretty > > > harmless. > > > > > > .. and after looking, I'm fairly sure the bug is in > > > heap_tuple_needs_freeze. > > > > heap_tuple_needs_freeze() isn't *allowed* to look at > > XMAX_INVALID. Otherwise it could miss freezing something still visible > > on a standby or after an eventual crash. > > I think this rule is wrong. I think the rule ought to be something like > "if the XMAX_INVALID bit is set, then reset whatever is there if there > is something; if the bit is not set, proceed as today". Otherwise we > risk reading garbage, which is what is happening in this case. Andres asks on IM: How come there is garbage there in the first place? I have to admit I have no idea. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > I think this rule is wrong. I think the rule ought to be something like > > "if the XMAX_INVALID bit is set, then reset whatever is there if there > > is something; if the bit is not set, proceed as today". Otherwise we > > risk reading garbage, which is what is happening in this case. > > Andres asks on IM: How come there is garbage there in the first place? > I have to admit I have no idea. I haven't got any great explanation for that either. I continue to feel that it's much more likely that it's an xid than a multixid. Is it possible that it was stamped with a real xmax through some code path which ignored the IS_MULTI flag? This could have been from as far back as 9.0-era. On this over-7TB database, only this one tuple had the issue. I have another set of databases which add up to ~20TB that I'm currently testing an upgrade from 9.2 to 9.3 on and will certainly let everyone know if I run into a similar situation there. On our smallest DB (which we upgraded first) which is much more OLTP instead of OLAP, we didn't run into this. This is all on physical gear and we've seen no indications that there has been any corruption. Hard to rule it out completely, but it seems pretty unlikely. Thanks, Stephen
On Mon, Mar 31, 2014 at 09:36:03AM -0400, Stephen Frost wrote: > Andres, > > * Andres Freund (andres@2ndquadrant.com) wrote: > > Without having looked at the code, IIRC this looks like some place > > misses passing allow_old=true where it's actually required. Any chance > > you can get a backtrace for the error message? I know you said somewhere > > below that you'd worked around the problem, but maybe you have a copy of > > the database somewhere? > > Looks like your idea that is has to do w/ freezeing is accurate... Where are we on this? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Andres Freund wrote: > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > My conclusion here is that some part of the code is failing to examine > > XMAX_INVALID before looking at the value stored in xmax itself. There > > ought to be a short-circuit. Fortunately, this bug should be pretty > > harmless. > > > > .. and after looking, I'm fairly sure the bug is in > > heap_tuple_needs_freeze. > > heap_tuple_needs_freeze() isn't *allowed* to look at > XMAX_INVALID. Otherwise it could miss freezing something still visible > on a standby or after an eventual crash. I think what we should do here is that if we see that XMAX_INVALID is set, we just reset everything to zero without checking the multixact contents. Something like the attached (warning: hand-edited, line numbers might be bogus) I still don't know under what circumstances this situation could arise. This seems most strange to me. I would wonder about this to be just papering over a different bug elsewhere, except that we know this tuple comes from a pg_upgraded table and so I think the only real solution is to cope. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Wed, Apr 23, 2014 at 03:01:02PM -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > > My conclusion here is that some part of the code is failing to examine > > > XMAX_INVALID before looking at the value stored in xmax itself. There > > > ought to be a short-circuit. Fortunately, this bug should be pretty > > > harmless. > > > > > > .. and after looking, I'm fairly sure the bug is in > > > heap_tuple_needs_freeze. > > > > heap_tuple_needs_freeze() isn't *allowed* to look at > > XMAX_INVALID. Otherwise it could miss freezing something still visible > > on a standby or after an eventual crash. > > I think what we should do here is that if we see that XMAX_INVALID is > set, we just reset everything to zero without checking the multixact > contents. Something like the attached (warning: hand-edited, line > numbers might be bogus) > > I still don't know under what circumstances this situation could arise. > This seems most strange to me. I would wonder about this to be just > papering over a different bug elsewhere, except that we know this tuple > comes from a pg_upgraded table and so I think the only real solution is > to cope. Shouldn't we log something at least if we are unsure of the cause? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote: > On Wed, Apr 23, 2014 at 03:01:02PM -0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote: > > > > My conclusion here is that some part of the code is failing to examine > > > > XMAX_INVALID before looking at the value stored in xmax itself. There > > > > ought to be a short-circuit. Fortunately, this bug should be pretty > > > > harmless. > > > > > > > > .. and after looking, I'm fairly sure the bug is in > > > > heap_tuple_needs_freeze. > > > > > > heap_tuple_needs_freeze() isn't *allowed* to look at > > > XMAX_INVALID. Otherwise it could miss freezing something still visible > > > on a standby or after an eventual crash. > > > > I think what we should do here is that if we see that XMAX_INVALID is > > set, we just reset everything to zero without checking the multixact > > contents. Something like the attached (warning: hand-edited, line > > numbers might be bogus) > > > > I still don't know under what circumstances this situation could arise. > > This seems most strange to me. I would wonder about this to be just > > papering over a different bug elsewhere, except that we know this tuple > > comes from a pg_upgraded table and so I think the only real solution is > > to cope. > > Shouldn't we log something at least if we are unsure of the cause? I don't know. Is it possible that XMAX_IS_MULTI got set because of cosmic rays? At this point that's the only explanation that makes sense to me. And I'm not sure what to do about this until we know more -- more user reports of this problem, for instance. I don't see any reasonable way to distinguish this particular kind of multixact-out-of-bounds situation from any other, so not sure what else to log either (you can see that we already emit an error message.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Apr 23, 2014 at 03:42:14PM -0300, Alvaro Herrera wrote: > > > I still don't know under what circumstances this situation could arise. > > > This seems most strange to me. I would wonder about this to be just > > > papering over a different bug elsewhere, except that we know this tuple > > > comes from a pg_upgraded table and so I think the only real solution is > > > to cope. > > > > Shouldn't we log something at least if we are unsure of the cause? > > I don't know. Is it possible that XMAX_IS_MULTI got set because of > cosmic rays? At this point that's the only explanation that makes sense > to me. And I'm not sure what to do about this until we know more -- > more user reports of this problem, for instance. > > I don't see any reasonable way to distinguish this particular kind of > multixact-out-of-bounds situation from any other, so not sure what else > to log either (you can see that we already emit an error message.) I guess I am lost then. I thought it supressed the error. What does the patch do? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote: > On Wed, Apr 23, 2014 at 03:42:14PM -0300, Alvaro Herrera wrote: > > > > I still don't know under what circumstances this situation could arise. > > > > This seems most strange to me. I would wonder about this to be just > > > > papering over a different bug elsewhere, except that we know this tuple > > > > comes from a pg_upgraded table and so I think the only real solution is > > > > to cope. > > > > > > Shouldn't we log something at least if we are unsure of the cause? > > > > I don't know. Is it possible that XMAX_IS_MULTI got set because of > > cosmic rays? At this point that's the only explanation that makes sense > > to me. And I'm not sure what to do about this until we know more -- > > more user reports of this problem, for instance. > > > > I don't see any reasonable way to distinguish this particular kind of > > multixact-out-of-bounds situation from any other, so not sure what else > > to log either (you can see that we already emit an error message.) > > I guess I am lost then. I thought it supressed the error. What does > the patch do? You're right, it does. I am not sure I would apply it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On April 23, 2014 8:51:21 PM CEST, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >Bruce Momjian wrote: >> On Wed, Apr 23, 2014 at 03:42:14PM -0300, Alvaro Herrera wrote: >> > > > I still don't know under what circumstances this situation >could arise. >> > > > This seems most strange to me. I would wonder about this to be >just >> > > > papering over a different bug elsewhere, except that we know >this tuple >> > > > comes from a pg_upgraded table and so I think the only real >solution is >> > > > to cope. >> > > >> > > Shouldn't we log something at least if we are unsure of the >cause? >> > >> > I don't know. Is it possible that XMAX_IS_MULTI got set because of >> > cosmic rays? At this point that's the only explanation that makes >sense >> > to me. And I'm not sure what to do about this until we know more >-- >> > more user reports of this problem, for instance. >> > >> > I don't see any reasonable way to distinguish this particular kind >of >> > multixact-out-of-bounds situation from any other, so not sure what >else >> > to log either (you can see that we already emit an error message.) >> >> I guess I am lost then. I thought it supressed the error. What does >> the patch do? > >You're right, it does. I am not sure I would apply it. I think this patch is a seriously bad idea. For one, it's not actually doing anything about the problem - the tuple can beaccessed without freezing getting involved. For another, it will increase wall traffic for freezing of short lived tuplesconsiderably, without any benefit. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > I think this patch is a seriously bad idea. For one, it's not actually > doing anything about the problem - the tuple can be accessed without > freezing getting involved. Normal access other than freeze is not a problem, because other code paths do check for HEAP_XMAX_INVALID and avoid access to Xmax if that's set. > For another, it will increase wall traffic for freezing of short lived > tuples considerably, without any benefit. True. I didn't actually try to run this; it was just for demonstration purposes. It'd need some cooperation from heap_tuple_needs_freeze in order to work at all, for one thing. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-04-23 16:30:05 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > I think this patch is a seriously bad idea. For one, it's not actually > > doing anything about the problem - the tuple can be accessed without > > freezing getting involved. > > Normal access other than freeze is not a problem, because other code > paths do check for HEAP_XMAX_INVALID and avoid access to Xmax if that's > set. > > > For another, it will increase wall traffic for freezing of short lived > > tuples considerably, without any benefit. > > True. I didn't actually try to run this; it was just for demonstration > purposes. It'd need some cooperation from heap_tuple_needs_freeze in > order to work at all, for one thing. I think if you want to add something like this it should be added *inside* the if (MultiXactIdPrecedes(multi, cutoff_multi)) block in FreezeMultiXactId(). There it seems to make quite a bit of sense. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Stephen Frost wrote: > Greetings, > > Looks like we might not be entirely out of the woods yet regarding > MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the > following: > > ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound > > The table contents can be select'd out and match the pre-upgrade > backup, but any attempt to VACUUM / VACUUM FULL / CLUSTER fails, > unsurprisingly. I finally figured what is going on here, though I don't yet have a patch. This has been reported a number of times: https://www.postgresql.org/message-id/20140330040029.GY4582%40tamriel.snowman.net https://www.postgresql.org/message-id/538F3D70.6080902%40publicrelay.com https://www.postgresql.org/message-id/556439CF.7070109%40pscs.co.uk https://www.postgresql.org/message-id/20160614173150.GA443784@alvherre.pgsql https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org We theorised that we were missing some place that was failing to pass the "allow_old" flag to GetMultiXactIdMembers; and since we couldn't find any and the problem was worked around simply (by doing SELECT FOR UPDATE or equivalent on the affected tuples), there was no further research. (The allow_old flag is passed for tuples that match an infomask bit pattern that can only come from tuples locked in 9.2 and prior, i.e. one that is never set by 9.3ff). Yesterday I had to deal with it and quickly found what is going wrong: the problem is that 9.2 and earlier it was acceptable (and common) to leave tuples with very old multixacts in xmax, even after multixact counter wraparound. When one such value was found in a live tuple, GetMultiXactIdMembers() would notice that it was out of range and simply return "no members", at which point heap_update and siblings would consider the tuple as not locked and move on. When pg_upgrading a database containing tuples marked like that, the new code would error out, because during 9.3 multixact we considered that it was dangerous to silently allow tuples to be marked by values we didn't keep track of, so we made it an error instead, per https://www.postgresql.org/message-id/20111204122027.GA10035%40tornado.leadboat.com Some cases are allowed to be downgraded to DEBUG, when allow_old is true. I think that was a good choice in general so that possibly-data-eating bugs could be reported, but there's a problem in the specific case of tuples carried over by pg_upgrade whose Multixact is "further in the future" compared to the nextMultiXactId counter. I think it's pretty clear that we should let that error be downgraded to DEBUG too, like the other checks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: Alvaro> I think that was a good choice in general so thatAlvaro> possibly-data-eating bugs could be reported, but there'saAlvaro> problem in the specific case of tuples carried over byAlvaro> pg_upgrade whose Multixact is "further in thefuture" comparedAlvaro> to the nextMultiXactId counter. I think it's pretty clear thatAlvaro> we should let that errorbe downgraded to DEBUG too, like theAlvaro> other checks. But that leaves an obvious third issue: it's all very well to ignore the pre-upgrade (pre-9.3) mxid if it's older than the cutoff or it's in the future, but what if it's actually inside the currently valid range? Looking it up as though it were a valid mxid in that case seems to be completely wrong and could introduce more subtle errors. (It can, AFAICT, be inside the currently valid range due to wraparound, i.e. without there being a valid pg_multixact entry for it, because AFAICT in 9.2, once the mxid is hinted dead it is never again either looked up or cleared, so it can sit in the tuple xmax forever, even through multiple wraparounds.) Why is the correct rule not "check for and ignore pre-upgrade mxids before even trying to fetch members"? -- Andrew (irc:RhodiumToad)
Andrew Gierth wrote: > But that leaves an obvious third issue: it's all very well to ignore the > pre-upgrade (pre-9.3) mxid if it's older than the cutoff or it's in the > future, but what if it's actually inside the currently valid range? > Looking it up as though it were a valid mxid in that case seems to be > completely wrong and could introduce more subtle errors. You're right, we should not try to resolve a multixact coming from the old install in any case. > (It can, AFAICT, be inside the currently valid range due to wraparound, > i.e. without there being a valid pg_multixact entry for it, because > AFAICT in 9.2, once the mxid is hinted dead it is never again either > looked up or cleared, so it can sit in the tuple xmax forever, even > through multiple wraparounds.) HeapTupleSatisfiesVacuum removes very old multixacts; see the HEAP_IS_LOCKED block starting in line 1162 where we set HEAP_XMAX_INVALID for multixacts that are not running by falling through. It's not a strict guarantee but this probably explains why this problem is not more common. > Why is the correct rule not "check for and ignore pre-upgrade mxids > before even trying to fetch members"? I propose something like the attached patch, which implements that idea. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Alvaro Herrera wrote: > Andrew Gierth wrote: > > Why is the correct rule not "check for and ignore pre-upgrade mxids > > before even trying to fetch members"? > > I propose something like the attached patch, which implements that idea. Hm, this doesn't apply cleanly to 9.4. I'll need to come up with a merge tomorrow. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> (It can, AFAICT, be inside the currently valid range due to>> wraparound, i.e. without there being a valid pg_multixactentry for>> it, because AFAICT in 9.2, once the mxid is hinted dead it is never>> again either looked up or cleared,so it can sit in the tuple xmax>> forever, even through multiple wraparounds.) Alvaro> HeapTupleSatisfiesVacuum removes very old multixacts It does nothing of the kind; it only marks them HEAP_XMAX_INVALID. The actual mxid remains in the tuple xmax field. The failing mxids in the case I analyzed on -bugs are failing _in spite of_ being already hinted HEAP_XMAX_INVALID, because the code path in question doesn't check that. -- Andrew (irc:RhodiumToad)
On Thu, Jun 16, 2016 at 4:50 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > Why is the correct rule not "check for and ignore pre-upgrade mxids > before even trying to fetch members"? I entirely believe that's the correct rule, but doesn't implementing it require a crystal balll? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: >> Why is the correct rule not "check for and ignore pre-upgrade mxids>> before even trying to fetch members"? Robert> I entirely believe that's the correct rule, but doesn'tRobert> implementing it require a crystal balll? Why would it? Pre-9.3 mxids are identified by the combination of flag bits in the infomask, see Alvaro's patch. -- Andrew (irc:RhodiumToad)
Andrew Gierth wrote: > >>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > >> (It can, AFAICT, be inside the currently valid range due to > >> wraparound, i.e. without there being a valid pg_multixact entry for > >> it, because AFAICT in 9.2, once the mxid is hinted dead it is never > >> again either looked up or cleared, so it can sit in the tuple xmax > >> forever, even through multiple wraparounds.) > > Alvaro> HeapTupleSatisfiesVacuum removes very old multixacts > > It does nothing of the kind; it only marks them HEAP_XMAX_INVALID. The > actual mxid remains in the tuple xmax field. > > The failing mxids in the case I analyzed on -bugs are failing _in spite > of_ being already hinted HEAP_XMAX_INVALID, because the code path in > question doesn't check that. Ah, right. I had some code to reset HEAP_XMAX_IS_MULTI early on but somebody didn't like it and we removed it; and we also removed some of the checks for HEAP_XMAX_INVALID we had, or perhaps didn't extend them to every place that needed them. It's not critical now anyway; the patch I posted (or some variation thereof) should suffice as a fix. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Andrew Gierth wrote: > > Why is the correct rule not "check for and ignore pre-upgrade mxids > > before even trying to fetch members"? > > I propose something like the attached patch, which implements that idea. Here's a backpatch of that to 9.3 and 9.4. I tested this by creating a 9.2 installation with an out-of-range multixact, and verified that after upgrading this to 9.3 it fails with the "apparent wraparound" message in VACUUM. With this patch applied, it silently removes the multixact. I will clean this up and push to all branches after the tagging of 9.6beta2 on Monday. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On Fri, Jun 17, 2016 at 9:33 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: > >> Why is the correct rule not "check for and ignore pre-upgrade mxids > >> before even trying to fetch members"? > > Robert> I entirely believe that's the correct rule, but doesn't > Robert> implementing it require a crystal balll? > > Why would it? Pre-9.3 mxids are identified by the combination of flag > bits in the infomask, see Alvaro's patch. I see the patch, but I don't see much explanation of why the patch is correct, which I think is pretty scary in view of the number of mistakes we've already made in this area. The comments just say: + * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of + * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was + * share-locked in 9.2 or earlier and then pg_upgrade'd. Why must that be true? + * We must not try to resolve such multixacts locally, because the result would + * be bogus, regardless of where they stand with respect to the current valid + * range. What about other pre-upgrade mxacts that don't have this exact bit pattern? Why can't we try to resolve them and end up in trouble just as easily? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Fri, Jun 17, 2016 at 9:33 AM, Andrew Gierth > <andrew@tao11.riddles.org.uk> wrote: > >>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: > > >> Why is the correct rule not "check for and ignore pre-upgrade mxids > > >> before even trying to fetch members"? > > > > Robert> I entirely believe that's the correct rule, but doesn't > > Robert> implementing it require a crystal balll? > > > > Why would it? Pre-9.3 mxids are identified by the combination of flag > > bits in the infomask, see Alvaro's patch. > > I see the patch, but I don't see much explanation of why the patch is > correct, which I think is pretty scary in view of the number of > mistakes we've already made in this area. ... and actually the patch fails one isolation tests in 9.3, which is why I haven't pushed (I haven't tested 9.4 but I suppose it should be the same). I'm looking into that now. > The comments just say: > > + * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of > + * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was > + * share-locked in 9.2 or earlier and then pg_upgrade'd. > > Why must that be true? The reason this must be true is that in 9.2 and earlier multixacts were only used to lock tuples FOR SHARE, which had that specific bit pattern. I suppose I need to make this comment more explicit. > + * We must not try to resolve such multixacts locally, because the result would > + * be bogus, regardless of where they stand with respect to the current valid > + * range. > > What about other pre-upgrade mxacts that don't have this exact bit > pattern? Why can't we try to resolve them and end up in trouble just > as easily? There shouldn't be any. Back then, it was not possible to have tuples locked and updated at the same time; FOR UPDATE (the only other locking mode back then) didn't allow other lockers, so the only possibility was FOR SHARE with that bit pattern. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Robert Haas wrote: > > On Fri, Jun 17, 2016 at 9:33 AM, Andrew Gierth > > <andrew@tao11.riddles.org.uk> wrote: > > >>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: > > > >> Why is the correct rule not "check for and ignore pre-upgrade mxids > > > >> before even trying to fetch members"? > > > > > > Robert> I entirely believe that's the correct rule, but doesn't > > > Robert> implementing it require a crystal balll? > > > > > > Why would it? Pre-9.3 mxids are identified by the combination of flag > > > bits in the infomask, see Alvaro's patch. > > > > I see the patch, but I don't see much explanation of why the patch is > > correct, which I think is pretty scary in view of the number of > > mistakes we've already made in this area. > > ... and actually the patch fails one isolation tests in 9.3, which is > why I haven't pushed (I haven't tested 9.4 but I suppose it should be > the same). I'm looking into that now. Ah, it should have been obvious; the reason it's failing is because 9.3 and 9.4 lack commit 27846f02c176 which removed MultiXactHasRunningRemoteMembers(), so the straight backpatch plus conflict fixes left one GetMultiXactIdMembers call with the allow_old flag to true. The attached patch fixes that omission. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Robert Haas wrote: > I see the patch, but I don't see much explanation of why the patch is > correct, which I think is pretty scary in view of the number of > mistakes we've already made in this area. The comments just say: > > + * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of > + * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was > + * share-locked in 9.2 or earlier and then pg_upgrade'd. > > Why must that be true? > > + * We must not try to resolve such multixacts locally, because the result would > + * be bogus, regardless of where they stand with respect to the current valid > + * range. > > What about other pre-upgrade mxacts that don't have this exact bit > pattern? Why can't we try to resolve them and end up in trouble just > as easily? Attached are two patches, one for 9.3 and 9.4 and the other for 9.5 and master. Pretty much the same as before, but with answers to the above concerns. In particular, /* + * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of + * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was + * share-locked in 9.2 or earlier and then pg_upgrade'd. + * + * In 9.2 and prior, HEAP_XMAX_IS_MULTI was only set when there were multiple + * FOR SHARE lockers of that tuple. That set HEAP_XMAX_LOCK_ONLY (with a + * different name back then) but neither of HEAP_XMAX_EXCL_LOCK and + * HEAP_XMAX_KEYSHR_LOCK. That combination is no longer possible in 9.3 and + * up, so if we see that combination we know for certain that the tuple was + * locked in an earlier release; since all such lockers are gone (they cannot + * survive through pg_upgrade), such tuples can safely be considered not + * locked. + * + * We must not resolve such multixacts locally, because the result would be + * bogus, regardless of where they stand with respect to the current valid + * multixact range. + */ +#define HEAP_LOCKED_UPGRADED(infomask) \ +( \ + ((infomask) & HEAP_XMAX_IS_MULTI) && \ + ((infomask) & HEAP_XMAX_LOCK_ONLY) && \ + (((infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) == 0) \ +) One place that had an XXX comment in the previous patch (heap_lock_updated_tuple_rec) now contains this: + /* + * We don't need a test for pg_upgrade'd tuples: this is only + * applied to tuples after the first in an update chain. Said + * first tuple in the chain may well be locked-in-9.2-and- + * pg_upgraded, but that one was already locked by our caller, + * not us; and any subsequent ones cannot be because our + * caller must necessarily have obtained a snapshot later than + * the pg_upgrade itself. + */ + Assert(!HEAP_LOCKED_UPGRADED(mytup.t_data->t_infomask)); -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
After some further testing, I noticed a case that wasn't handled in heap_update, which I also fixed. I reworded some comments here and there, and pushed to all branches. Further testing and analysis is welcome. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services