Обсуждение: "show all" command crashes server
Hi folks
First time poster here so please extend grace if I don't initially provide what is needed to help.
I am running postgresql 8.3.7 on debian lenny (postgresql-8.3_8.3.7-0lenny1_i386.deb).
I have three of these servers and generally they run well.
On this one server if I use the command "show all" in psql, phpPgAdmin or pgAdmin3 the postgresql server spits the dummy as follows:
postgres@theconsole:~$ psql
Welcome to psql 8.3.7, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
postgres=# show all;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
In the syslog is:
Sep 10 23:55:14 theconsole postgres[31118]: [3-2] 0: LOCATION: reaper, postmaster.c:2156
Sep 10 23:55:15 theconsole postgres[31124]: [4-1] [local] [unknown] [unknown] 0: LOG: 08P01: incomplete startup packet
Sep 10 23:55:15 theconsole postgres[31124]: [4-2] [local] [unknown] [unknown] 0: LOCATION: ProcessStartupPacket, postmaster.c:1396
Sep 10 23:55:36 theconsole postgres[31118]: [4-1] 0: LOG: 00000: server process (PID 31145) was terminated by signal 11: Segmentation fault
Sep 10 23:55:36 theconsole postgres[31118]: [4-2] 0: LOCATION: LogChildExit, postmaster.c:2529
Sep 10 23:55:36 theconsole postgres[31118]: [5-1] 0: LOG: 00000: terminating any other active server processes
Sep 10 23:55:36 theconsole postgres[31118]: [5-2] 0: LOCATION: HandleChildCrash, postmaster.c:2374
Sep 10 23:55:36 theconsole postgres[31118]: [6-1] 0: LOG: 00000: all server processes terminated; reinitializing
Sep 10 23:55:36 theconsole postgres[31118]: [6-2] 0: LOCATION: PostmasterStateMachine, postmaster.c:2690
Sep 10 23:55:36 theconsole postgres[31146]: [7-1] 0: LOG: 00000: database system was interrupted; last known up at 2009-09-10 23:55:14 EST
Sep 10 23:55:36 theconsole postgres[31146]: [7-2] 0: LOCATION: StartupXLOG, xlog.c:4836
Sep 10 23:55:36 theconsole postgres[31147]: [7-1] [local] postgres postgres 0: FATAL: 57P03: the database system is in recovery mode
Sep 10 23:55:36 theconsole postgres[31147]: [7-2] [local] postgres postgres 0: LOCATION: ProcessStartupPacket, postmaster.c:1648
Sep 10 23:55:36 theconsole postgres[31146]: [8-1] 0: LOG: 00000: database system was not properly shut down; automatic recovery in progress
Sep 10 23:55:36 theconsole postgres[31146]: [8-2] 0: LOCATION: StartupXLOG, xlog.c:5003
Sep 10 23:55:36 theconsole postgres[31146]: [9-1] 0: LOG: 00000: record with zero length at 2A/E734761C
Sep 10 23:55:36 theconsole postgres[31146]: [9-2] 0: LOCATION: ReadRecord, xlog.c:3126
Sep 10 23:55:36 theconsole postgres[31146]: [10-1] 0: LOG: 00000: redo is not required
Sep 10 23:55:36 theconsole postgres[31146]: [10-2] 0: LOCATION: StartupXLOG, xlog.c:5146
Sep 10 23:55:36 theconsole postgres[31150]: [7-1] 0: LOG: 00000: autovacuum launcher started
Sep 10 23:55:36 theconsole postgres[31150]: [7-2] 0: LOCATION: AutoVacLauncherMain, autovacuum.c:520
Sep 10 23:55:36 theconsole postgres[31118]: [7-1] 0: LOG: 00000: database system is ready to accept connections
this is 100% repeatable.
The database seems to work fine unless this command is run then it is instant death.
any help would be appreciated
regards
Grant
Grant Maxwell wrote: > Hi folks > > First time poster here so please extend grace if I don't initially > provide what is needed to help. > > I am running postgresql 8.3.7 on debian lenny > (postgresql-8.3_8.3.7-0lenny1_i386.deb) Well that's useful. > I have three of these servers and generally they run well. As is that. > On this one server if I use the command "show all" in psql, phpPgAdmin > or pgAdmin3 the postgresql server spits the dummy as follows: > postgres=# show all; > server closed the connection unexpectedly Hmm - some modules can provide their own config variables. Do you have the same modules installed in all three servers? Can you "show" individual variables? -- Richard Huxton Archonet Ltd
On Thu, Sep 10, 2009 at 8:37 AM, Grant Maxwell<grant.maxwell@maxan.com.au> wrote: > Hi folks > First time poster here so please extend grace if I don't initially provide > what is needed to help. > I am running postgresql 8.3.7 on debian lenny > (postgresql-8.3_8.3.7-0lenny1_i386.deb). > I have three of these servers and generally they run well. SNIP > Sep 10 23:55:36 theconsole postgres[31118]: [4-1] 0: LOG: 00000: server > process (PID 31145) was terminated by signal 11: Segmentation fault Sig 11 is a process crash which can be caused by bad hardware or corrupted / buggy binaries. I'd try reinstalling pgsql binaries and see if that helps.
On 11/09/2009, at 1:09 AM, Richard Huxton wrote: > > >> On this one server if I use the command "show all" in psql, >> phpPgAdmin >> or pgAdmin3 the postgresql server spits the dummy as follows: > >> postgres=# show all; >> server closed the connection unexpectedly > > Hmm - some modules can provide their own config variables. Do you have > the same modules installed in all three servers? How can I determine what modules are installed ? I do know that pgmemcache is installed on this server - but it was there before the problems started and it works ok. > > Can you "show" individual variables? I did a show all on one of the other servers, created a script to use each of the resulting outputs in a single show statement and ran on the problem server. It ran without a fault. I then took the postgresql.conf file from the problem server, grabbed all the config lines and submitted them one at a time (again with a script) and it also worked fine. regards Grant Maxwell
Grant Maxwell <grant.maxwell@maxan.com.au> writes: > On 11/09/2009, at 1:09 AM, Richard Huxton wrote: >> Hmm - some modules can provide their own config variables. Do you have >> the same modules installed in all three servers? > How can I determine what modules are installed ? The contents of the local_preload_libraries and shared_preload_libraries parameters would probably be enough ... regards, tom lane
On 11/09/2009, at 8:17 AM, Tom Lane wrote: > Grant Maxwell <grant.maxwell@maxan.com.au> writes: >> On 11/09/2009, at 1:09 AM, Richard Huxton wrote: >>> Hmm - some modules can provide their own config variables. Do you >>> have >>> the same modules installed in all three servers? > >> How can I determine what modules are installed ? > > The contents of the local_preload_libraries and > shared_preload_libraries > parameters would probably be enough ... > > regards, tom lane > On the problem server: shared_preload_libraries = 'pgmemcache' #local_preload_libraries = '' on the others both are emply. For good measure I removed pgmemcache but the problem persists. I have now put it back. regards Grant
Grant Maxwell <grant.maxwell@maxan.com.au> writes: > On the problem server: > shared_preload_libraries = 'pgmemcache' > #local_preload_libraries = '' > on the others both are emply. Sounds like a smoking gun to me. > For good measure I removed pgmemcache but the problem persists. Did you restart the postmaster afterwards? shared_preload_libraries is only considered at postmaster start. regards, tom lane
On 11/09/2009, at 8:36 AM, Tom Lane wrote: > Grant Maxwell <grant.maxwell@maxan.com.au> writes: >> On the problem server: >> shared_preload_libraries = 'pgmemcache' >> #local_preload_libraries = '' > >> on the others both are emply. > > Sounds like a smoking gun to me. > >> For good measure I removed pgmemcache but the problem persists. > > Did you restart the postmaster afterwards? shared_preload_libraries > is only considered at postmaster start. yep - full restart. > > regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >> shared_preload_libraries = 'pgmemcache' ... > Sounds like a smoking gun to me. Yep, known problem with pgmemcache. Bruce and I poked around with this about a year ago. Bruce, I think you were going to throw the problem at some EDB people - did that ever happen? I seem to recall we fixed that particular problem as well during the codeathon at OpenSQL Camp. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200909102039 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkqpnIEACgkQvJuQZxSWSsgIPQCgnvLBNKLqeAVcx8r2ufEcNPyF bZ4An2Ed60lQ1kyokrAoGJFPQm1fwpOQ =3i3z -----END PGP SIGNATURE-----
Grant Maxwell <grant.maxwell@maxan.com.au> writes: > On 11/09/2009, at 8:36 AM, Tom Lane wrote: >> Did you restart the postmaster afterwards? > yep - full restart. okay, next step is to collect a stack trace ... regards, tom lane
First of all thanks to those who provided input.
This problem is now fixed and I thought I would post this solution so that others might benefit in the future.
For the sake of completeness:
The error was that if "show all" was run on this postgresql (version 8.3) server, postgres would crash and then recover.
Otherwise the server "seemed" healthy
The postgres log showed:
Sep 10 23:55:36 theconsole postgres[31118]: [4-1] 0: LOG: 00000: server process (PID 31145) was terminated by signal 11: Segmentation fault
Sep 10 23:55:36 theconsole postgres[31118]: [4-2] 0: LOCATION: LogChildExit, postmaster.c:2529
Sep 10 23:55:36 theconsole postgres[31118]: [5-1] 0: LOG: 00000: terminating any other active server processes
Sep 10 23:55:36 theconsole postgres[31118]: [5-2] 0: LOCATION: HandleChildCrash, postmaster.c:2374
Sep 10 23:55:36 theconsole postgres[31118]: [6-1] 0: LOG: 00000: all server processes terminated; reinitializing
Sep 10 23:55:36 theconsole postgres[31118]: [6-2] 0: LOCATION: PostmasterStateMachine, postmaster.c:2690
Sep 10 23:55:36 theconsole postgres[31146]: [7-1] 0: LOG: 00000: database system was interrupted; last known up at 2009-09-10 23:55:14 EST
Sep 10 23:55:36 theconsole postgres[31146]: [7-2] 0: LOCATION: StartupXLOG, xlog.c:4836
Sep 10 23:55:36 theconsole postgres[31147]: [7-1] [local] postgres postgres 0: FATAL: 57P03: the database system is in recovery mode
Sep 10 23:55:36 theconsole postgres[31147]: [7-2] [local] postgres postgres 0: LOCATION: ProcessStartupPacket, postmaster.c:1648
Sep 10 23:55:36 theconsole postgres[31146]: [8-1] 0: LOG: 00000: database system was not properly shut down; automatic recovery in progress
Sep 10 23:55:36 theconsole postgres[31146]: [8-2] 0: LOCATION: StartupXLOG, xlog.c:5003
Sep 10 23:55:36 theconsole postgres[31146]: [9-1] 0: LOG: 00000: record with zero length at 2A/E734761C
Sep 10 23:55:36 theconsole postgres[31146]: [9-2] 0: LOCATION: ReadRecord, xlog.c:3126
Sep 10 23:55:36 theconsole postgres[31146]: [10-1] 0: LOG: 00000: redo is not required
Sep 10 23:55:36 theconsole postgres[31146]: [10-2] 0: LOCATION: StartupXLOG, xlog.c:5146
Sep 10 23:55:36 theconsole postgres[31150]: [7-1] 0: LOG: 00000: autovacuum launcher started
Sep 10 23:55:36 theconsole postgres[31150]: [7-2] 0: LOCATION: AutoVacLauncherMain, autovacuum.c:520
Sep 10 23:55:36 theconsole postgres[31118]: [7-1] 0: LOG: 00000: database system is ready to accept connections
SOLUTION:
Increase the memory on the server.
WHY
We had recently ( a month before) had installed splunk on the server. It was running ok
The combination of splunk and other tasks running had pushed the memory too close.
What we did not notice was that swap had been almost completely consumed - nasty
RESULT
We shut it all down, increased the memory (double) and voila - problem gone.
It goes to show that when hunting problems we should not ignore the basic environmental elements.
It also goes to show that our monitoring system was not looking at this relatively new server.
(this confession is not an invitation for a spanking)
again thanks for the help
Grant
On 11/09/2009, at 9:09 AM, Grant Maxwell wrote:
On 11/09/2009, at 8:36 AM, Tom Lane wrote:Grant Maxwell <grant.maxwell@maxan.com.au> writes:On the problem server:shared_preload_libraries = 'pgmemcache'#local_preload_libraries = ''on the others both are emply.Sounds like a smoking gun to me.For good measure I removed pgmemcache but the problem persists.Did you restart the postmaster afterwards? shared_preload_librariesis only considered at postmaster start.
yep - full restart.regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Grant Maxwell <grant.maxwell@maxan.com.au> writes: > The error was that if "show all" was run on this postgresql (version > 8.3) server, postgres would crash and then recover. > The postgres log showed: > Sep 10 23:55:36 theconsole postgres[31118]: [4-1] 0: LOG: 00000: > server process (PID 31145) was terminated by signal 11: Segmentation > fault > We had recently ( a month before) had installed splunk on the server. > It was running ok > The combination of splunk and other tasks running had pushed the > memory too close. > What we did not notice was that swap had been almost completely > consumed - nasty > We shut it all down, increased the memory (double) and voila - > problem gone. Hmm. A segfault in that case seems to indicate that something somewhere is failing to check for a null result from malloc(). Which is a bug we ought to fix. Is there any chance of getting a core dump stack trace from one of those crashes? regards, tom lane