On Mon, Apr 11, 2011 at 11:41:18AM +0100, Leonardo Francalanci wrote:
> > > But re-reading it, I don't understand: what's the difference in creating
> > > a new "regular" table and crashing before emitting the abort record,
> > > and converting an unlogged table to logged and crashing before
> > > emitting the abort record? How do the standby servers handle a
> > > "CREATE TABLE" followed by a ROLLBACK if the master crashes
> > > before writing the abort record? I thought that too would "leave a
> > > stray file around on a standby".
> >
> > I've been thinking about the same thing. And AFAICS, your analysis is
> > correct, though there may be some angle to it I'm not seeing.
>
>
> Anyone else? I would like to know if what I'm trying to do is, in fact,
> possible... otherwise starting with thewal_level=minimal case first
> will be wasted effort in case the other cases can't be integrated
> somehow...
If the master crashes while a transaction that used CREATE TABLE is unfinished,
both the master and the standby will indefinitely retain identical, stray (not
referenced by pg_class) files. The catalogs do reference the relfilenode of
each unlogged relation; currently, that relfilenode never exists on a standby
while that standby is accepting connections. By the time the startup process
releases the AccessExclusiveLock acquired by the proposed UNLOGGED -> normal
conversion process, that relfilenode needs to be either fully copied or unlinked
all over again. (Alternately, find some other way to make sure queries don't
read the half-copied file.) In effect, the problem is that the relfilenode is
*not* stray, so its final state does need to be well-defined.
nm