Обсуждение: dblink crash on PPC
Something odd is happening on buildfarm member wombat, a PPC970MP box running Gentoo. We're getting dblink test failures. On the one I looked at more closely I saw this: [4ddf2c59.7aec:153] LOG: disconnection: session time: 0:00:00.444 user=markwkm database=contrib_regression host=[local] and then: [4ddf2c4e.79d4:2] LOG: server process (PID 31468) was terminated by signal 11: Segmentation fault [4ddf2c4e.79d4:3] LOG: terminating any other active server processes which makes it look like something is failing badly in the backend cleanup code. (7aec = hex(31468)) We don't seem to have a backtrace, which is sad. This seems to be happening on the 9.0 branch too. I wonder what it could be? cheers andrew
On Fri, May 27, 2011 at 8:44 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > > Something odd is happening on buildfarm member wombat, a PPC970MP box > running Gentoo. We're getting dblink test failures. On the one I looked at > more closely I saw this: > > [4ddf2c59.7aec:153] LOG: disconnection: session time: 0:00:00.444 > user=markwkm database=contrib_regression host=[local] > > and then: > > [4ddf2c4e.79d4:2] LOG: server process (PID 31468) was terminated by signal > 11: Segmentation fault > [4ddf2c4e.79d4:3] LOG: terminating any other active server processes > > which makes it look like something is failing badly in the backend cleanup > code. (7aec = hex(31468)) > > We don't seem to have a backtrace, which is sad. > > This seems to be happening on the 9.0 branch too. > > I wonder what it could be? Around when did it start failing? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Andrew Dunstan <andrew@dunslane.net> wrote: >> >> Something odd is happening on buildfarm member wombat, a PPC970MP >> box running Gentoo. We're getting dblink test failures. On the >> one I << looked at more closely I saw this: >> >> [4ddf2c59.7aec:153] LOG: disconnection: session time: >> 0:00:00.444 >> user=markwkm database=contrib_regression host=[local] >> >> and then: >> >> [4ddf2c4e.79d4:2] LOG: server process (PID 31468) was terminated >> by signal 11: Segmentation fault >> [4ddf2c4e.79d4:3] LOG: terminating any other active server >> processes >> >> which makes it look like something is failing badly in the >> backend cleanup code. (7aec = hex(31468)) >> >> We don't seem to have a backtrace, which is sad. >> >> This seems to be happening on the 9.0 branch too. >> >> I wonder what it could be? > > Around when did it start failing? According to the buildfarm logs the first failure was roughly 1 day 10 hours 40 minutes before this post. Keep in mind that PPC is a platform with weak memory ordering.... -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Robert Haas <robertmhaas@gmail.com> wrote: >> Around when did it start failing? > According to the buildfarm logs the first failure was roughly 1 day > 10 hours 40 minutes before this post. See http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=wombat&br=HEAD The problem here is that wombat has been offline for about a month before that, so it could have broken anytime in the past month. It's also not unlikely that the hiatus signals a change in the underlying hardware or software, which might have been the real cause. (Mark?) > Keep in mind that PPC is a platform with weak memory ordering.... grebe, which is also a PPC64 machine, isn't showing the bug. And I just failed to reproduce the problem on a RHEL6 PPC64 box. About to go try it on RHEL5, which has a gcc version much closer to what wombat says it's using, but I'm not very hopeful about that. I think the more likely thing to be keeping in mind is that Gentoo is a platform with poor quality control. regards, tom lane
I wrote: > grebe, which is also a PPC64 machine, isn't showing the bug. And I just > failed to reproduce the problem on a RHEL6 PPC64 box. About to go try > it on RHEL5, which has a gcc version much closer to what wombat says > it's using, but I'm not very hopeful about that. Nope, no luck there either. It's going to be hard to make any progress on this without investigation on wombat itself. regards, tom lane
On 11-05-27 12:35 PM, Tom Lane wrote: > > grebe, which is also a PPC64 machine, isn't showing the bug. And I just > failed to reproduce the problem on a RHEL6 PPC64 box. About to go try > it on RHEL5, which has a gcc version much closer to what wombat says > it's using, but I'm not very hopeful about that. I think the more > likely thing to be keeping in mind is that Gentoo is a platform with > poor quality control. > > regards, tom lane > As another data point, the dblink regression tests work fine for me on a PPC32 debian (squeeze,gcc 4.4.5) based system.
On Fri, May 27, 2011 at 10:06 AM, Steve Singer <ssinger@ca.afilias.info> wrote: > As another data point, the dblink regression tests work fine for me on a > PPC32 debian (squeeze,gcc 4.4.5) based system. Given that it's dblink my guess is that it's picking up the wrong version of libpq somehow. -- greg
Greg Stark <gsstark@mit.edu> writes: > On Fri, May 27, 2011 at 10:06 AM, Steve Singer <ssinger@ca.afilias.info> wrote: >> As another data point, the dblink regression tests work fine for me on a >> PPC32 debian (squeeze,gcc 4.4.5) based system. > Given that it's dblink my guess is that it's picking up the wrong > version of libpq somehow. Maybe, but then why does the test only crash during backend exit, and not while it's exercising dblink? regards, tom lane