On 25/01/13 13:06, Tom Lane wrote:
> Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
>> If I have done this right, then this is the trace for the 1st message...
>> from my wandering through the calls here it looks like a normal commit,
>> and something goes a bit weird as SI messages are being processed...
>
> Seems like the critical bit is here:
>
>> #11 0x00007f4e2a53d985 in exit () from /lib/x86_64-linux-gnu/libc.so.6
>> #12 0x00007f4e272b951a in ?? () from /usr/lib/libR.so
>> #13 <signal handler called>
>> #14 0x00007f4e2a538707 in kill () from /lib/x86_64-linux-gnu/libc.so.6
>> #15 0x00000000006152e5 in SICleanupQueue (
>> callerHasWriteLock=callerHasWriteLock@entry=1 '\001',
>> minFree=minFree@entry=4) at sinvaladt.c:672
>
> Frame 15 is definitely SICleanupQueue trying to send a catchup SIGUSR1
> interrupt to the furthest-behind backend. The fact that we go directly
> into a signal handler from the kill() suggests that the furthest-behind
> backend is actually *this* backend, which perhaps is a bit surprising,
> but it's supposed to work. What it looks like, though, is that libR has
> commandeered the SIGUSR1 signal handler, and just to be extra special
> unfriendly to the surrounding program, it does an exit() when it traps a
> SIGUSR1.
>
> Unless libR can be coerced into not screwing up our signal handlers,
> I'd say that PL/R is broken beyond repair. That would be unfortunate.
>
> regards, tom lane
It looks like Joe has run into something similar with libR stealing
SIGINT, he reinstalls it. A simple patch along the same lines for
SIGUSR1 (attached) seems to fix the issue.
I wonder if we need to install *all* the remaining signal handlers too?
Cheers
Mark