So I was testing my fix for the problem noted here:
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00196.php
and promptly found *another* bug. To wit, that repair_frag calls
HeapTupleSatisfiesVacuum without bothering to acquire any buffer
content lock. This results in an Assert failure inside
SetBufferCommitInfoNeedsSave, if HeapTupleSatisfiesVacuum tries to
update any hint bits for the tuple. I think that is impossible in
current releases, because the tuple's logical status was fully
determined by the prior call in scan_heap. But it's possible as of
8.3 because the walwriter or other backends could have moved the WAL
flush point, allowing a previously unhintable XMAX to become hintable.
I think the best solution for this is to acquire the buffer content lock
before calling HeapTupleSatisfiesVacuum --- it's really a pretty ugly
shortcut that the code didn't do that already. We could alternatively
refuse to do shrinking unless both XMIN and XMAX are correctly hinted
at scan_heap time; but there is not anything else in vacuum.c that seems
to require XMAX_COMMITTED to be set, so I'd rather not make that
restriction.
But to get to the point: the urgency of testing the patch more
extensively has just moved up a full order of magnitude, IMHO anyway.
I muttered something in the other thread about providing a buildfarm
option to run the regression tests with synchronous_commit off. That
would still be a good idea in the long run, but I want to take some more
drastic measures now. I propose that we actually set synchronous_commit
off by default for the next little while --- at least up to 8.3beta1,
maybe until we approach the RC point. That will ensure that every
buildfarm machine is exercising the async-commit behavior, as well as
every developer who's testing HEAD.
Of course the risk is that we might forget to turn it back on before
release :-(
Comments?
regards, tom lane