Обсуждение: "SMgrRelation hashtable corrupted" failure identified
We've seen a few reports of the above-mentioned error message from PG 8.0 testers, but up till now no one had come up with a reproducible test case. I've now found a trivial example: session 1: create table a1 (f1 varchar(128)); session 2: insert into a1 values('abc'); session 1: alter table a1 alter column f1 type varchar(256); session 2: insert into a1 values('abcd'); session 2 fails with ERROR: SMgrRelation hashtable corrupted continued use of session 2 leads to a crash Many if not all scenarios involving a rewriting ALTER TABLE on a table in active use by other backends will fail like this. I believe there are probably similar failures involving CLUSTER, though a quick try didn't show it. This seems clearly to be a "must fix for 8.0" bug. The basic problem is that when ALTER TABLE tries to swap the physical files associated with the original table and the temp version of the table, it sends out relcache inval events for all four combinations of table OID and relfilenode. Because inval.c is a bit cavalier about the ordering of inval events, the one that session 2 sees first is the one for <temp table OID, old relfilenode>. It does not find a relcache entry for the temp table OID, but it does find an smgr table entry for the relfilenode, which it proceeds to drop. Now there is a dangling smgr reference in its relcache, so when it next gets hit with a relcache clear event for the original table OID, boom! I fooled around with trying to patch this by enforcing the "right" processing order of inval events, but that doesn't work (it just moves the failure into the sending backend, which it turns out would need a different processing order to avoid crashing). It would be a horribly fragile solution anyway. I now think that the only reasonable fix is to directly attack the problem of dangling relcache references to smgr table entries. What we can do is add a concept of an "owning pointer" to an smgr entry, that is an "SMgrRelation *myowner" field, and have smgrclose do something likeif (reln->myowner) *(reln->myowner) = NULL; For smgr table entries associated with a relcache entry, the relcache code would set this field as a back link to its rel->rd_smgr pointer. With this setup, an smgr-level clear would correctly unhook from the relcache even if the clear did not come directly through the relcache. This would simplify RelationCacheInvalidateEntry and LocalExecuteInvalidationMessage, which could then treat relcache clear and smgr clear as independent operations. Comments? regards, tom lane
On Mon, 10 Jan 2005, Tom Lane wrote: > We've seen a few reports of the above-mentioned error message from > PG 8.0 testers, but up till now no one had come up with a reproducible > test case. I've now found a trivial example: > > session 1: create table a1 (f1 varchar(128)); > session 2: insert into a1 values('abc'); > session 1: alter table a1 alter column f1 type varchar(256); > session 2: insert into a1 values('abcd'); > session 2 fails with ERROR: SMgrRelation hashtable corrupted > continued use of session 2 leads to a crash > > Many if not all scenarios involving a rewriting ALTER TABLE on a > table in active use by other backends will fail like this. > I believe there are probably similar failures involving CLUSTER, > though a quick try didn't show it. This seems clearly to be a > "must fix for 8.0" bug. > > The basic problem is that when ALTER TABLE tries to swap the physical > files associated with the original table and the temp version of the > table, it sends out relcache inval events for all four combinations > of table OID and relfilenode. Because inval.c is a bit cavalier about > the ordering of inval events, the one that session 2 sees first is the > one for <temp table OID, old relfilenode>. It does not find a relcache > entry for the temp table OID, but it does find an smgr table entry for > the relfilenode, which it proceeds to drop. Now there is a dangling > smgr reference in its relcache, so when it next gets hit with a > relcache clear event for the original table OID, boom! > > I fooled around with trying to patch this by enforcing the "right" > processing order of inval events, but that doesn't work (it just moves > the failure into the sending backend, which it turns out would need > a different processing order to avoid crashing). It would be a horribly > fragile solution anyway. > > I now think that the only reasonable fix is to directly attack the > problem of dangling relcache references to smgr table entries. What we > can do is add a concept of an "owning pointer" to an smgr entry, that > is an "SMgrRelation *myowner" field, and have smgrclose do > something like > if (reln->myowner) > *(reln->myowner) = NULL; > For smgr table entries associated with a relcache entry, the relcache > code would set this field as a back link to its rel->rd_smgr pointer. > With this setup, an smgr-level clear would correctly unhook from the > relcache even if the clear did not come directly through the relcache. > This would simplify RelationCacheInvalidateEntry and > LocalExecuteInvalidationMessage, which could then treat relcache clear > and smgr clear as independent operations. > > Comments? Only: Josh, put a hold on those press releases, looks like an RC5 is forthcoming ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > On Mon, 10 Jan 2005, Tom Lane wrote: >> Comments? > Only: Josh, put a hold on those press releases, looks like an RC5 is > forthcoming ... I knew you were going to say that ;-) I'm not sure if we should insist on an RC5 for this or not. If we'd found it after release we'd have stuck it into 8.0.1 without any special extra testing. regards, tom lane