On Tue, Jul 26, 2011 at 06:04:16PM -0400, Tom Lane wrote:
> Noah Misch <noah@2ndQuadrant.com> writes:
> > On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote:
> >> Dirty cache line, maybe not, but what if the assembly code commands the
> >> CPU to load those variables into CPU registers before doing the
> >> comparison? If they're loaded with maxMsgNum coming in last (or at
> >> least after resetState), I think you can have the problem without any
> >> assumptions about cache line behavior at all. You just need the process
> >> to lose the CPU at the right time.
>
> > True. If the compiler places the resetState load first, you could hit the
> > anomaly by "merely" setting a breakpoint on the next instruction, waiting for
> > exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend continue.
> > I think, though, we should either plug that _and_ the cache incoherency case or
> > worry about neither.
>
> How do you figure that? The poor-assembly-code-order risk is both a lot
> easier to fix and a lot higher probability. Admittedly, it's still way
> way down there, but you only need a precisely-timed sleep, not a
> precisely-timed sleep *and* a cache line that somehow remained stale.
I think both probabilities are too low to usefully distinguish. An sinval
wraparound takes a long time even in a deliberate test setup: almost 30 hours @
10k messages/sec. To get a backend to sleep that long, you'll probably need
something like SIGSTOP or a debugger attach. The sleep has to fall within the
space of no more than a few instructions. Then, you'd need to release the
process at the exact moment for it to observe wrapped equality. In other words,
you get one split-millisecond opportunity every 30 hours of process sleep time.
If your backends don't have multi-hour sleeps, it can't ever happen.
Even so, all the better if we settle on an approach that has neither hazard.
--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services