Обсуждение: heap_update is broken in current sources
heap_update() currently ends with if (newbuf != buffer) { LockBuffer(newbuf, BUFFER_LOCK_UNLOCK); WriteBuffer(newbuf); } LockBuffer(buffer,BUFFER_LOCK_UNLOCK); WriteBuffer(buffer); /* invalidate caches */ RelationInvalidateHeapTuple(relation, &oldtup); RelationMark4RollbackHeapTuple(relation, newtup); return HeapTupleMayBeUpdated; This is broken because WriteBuffer releases our refcounts on the buffer pages that are holding the old and new tuples. By the time RelationInvalidateHeapTuple gets to do its thing, some other backend may have swapped a new disk page into the shared buffer that oldtup points at. catcache.c will then be using the wrong data to compute the hash index of the old tuple. This will at minimum result in failure to invalidate the old tuple out of our catcache (because we'll be searching the wrong hashchains), and can lead to a flat-out crash or Assert failure due to invalid data being fed to the hashing code. I have seen several nonrepeatable failures in the parallel regress tests in recent weeks, which I now believe are all traceable to this error. I will commit a fix for this error shortly, and have recommended to Marc that he re-roll the beta2 tarball before announcing it... regards, tom lane
How are we on this? On Sun, 7 Jan 2001, Tom Lane wrote: > heap_update() currently ends with > > if (newbuf != buffer) > { > LockBuffer(newbuf, BUFFER_LOCK_UNLOCK); > WriteBuffer(newbuf); > } > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > WriteBuffer(buffer); > > /* invalidate caches */ > RelationInvalidateHeapTuple(relation, &oldtup); > RelationMark4RollbackHeapTuple(relation, newtup); > > return HeapTupleMayBeUpdated; > > This is broken because WriteBuffer releases our refcounts on the buffer > pages that are holding the old and new tuples. By the time > RelationInvalidateHeapTuple gets to do its thing, some other backend may > have swapped a new disk page into the shared buffer that oldtup points > at. catcache.c will then be using the wrong data to compute the hash > index of the old tuple. This will at minimum result in failure to > invalidate the old tuple out of our catcache (because we'll be searching > the wrong hashchains), and can lead to a flat-out crash or Assert > failure due to invalid data being fed to the hashing code. > > I have seen several nonrepeatable failures in the parallel regress tests > in recent weeks, which I now believe are all traceable to this error. > > I will commit a fix for this error shortly, and have recommended to Marc > that he re-roll the beta2 tarball before announcing it... > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > How are we on this? It's fixed. I've also run the regress tests with bufmgr.c hacked up to discard pages (with forcible overwriting) as soon as their refcount goes to zero. That didn't disclose any similar bugs, although the coverage of the tests leaves much to be desired :-( regards, tom lane
okay, will bundle up beta2 and announce it tonight when I get home ... gives about 6 hrs or so to "halt the presses" *grin* On Mon, 8 Jan 2001, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > How are we on this? > > It's fixed. > > I've also run the regress tests with bufmgr.c hacked up to discard > pages (with forcible overwriting) as soon as their refcount goes to > zero. That didn't disclose any similar bugs, although the coverage > of the tests leaves much to be desired :-( > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org