Обсуждение: 7.4 Crashed... Why?
Looks like my copy of 7.4, which seems to have been running fine until now, crashed last night at about 1am. All my alarms went off and I got it started again, but I'd like to know what happened so I can sleep soundly. :-) Here's the serverlog entry: LOG: recycled transaction log file "0000000000000028" FATAL: lock file "/usr/local/pgsql/data/postmaster.pid" already exists HINT: Is another postmaster (PID 1010) running in data directory "/usr/local/pgsql/data"? Not much. Looks like it tried to restart itself, found the old pid file and crapped out... Or something. Why would it restart itself? Any ideas? We're running on RedHat 7.2. Thanks, Hunter
Hunter Hillegas <lists@lastonepicked.com> writes: > Here's the serverlog entry: > LOG: recycled transaction log file "0000000000000028" > FATAL: lock file "/usr/local/pgsql/data/postmaster.pid" already exists > HINT: Is another postmaster (PID 1010) running in data directory > "/usr/local/pgsql/data"? > Not much. Looks like it tried to restart itself, found the old pid file and > crapped out... Or something. Why would it restart itself? The postmaster *never* restarts itself. What the above looks like to me is some random script decided to try to start a new postmaster, and the new postmaster quite properly refused to do anything because there already was a running postmaster. You should look into your cron jobs and see what sort of interesting stuff might lurk there. regards, tom lane
Thanks Tom. Good to know that postmaster doesn't restart itself. I did find a cron job that was running in the suspect time... But all it does is the following: DATE=`date +%Y%m%d` DB1=/root/database_backup/db1_db.$DATE su - postgres -c "/usr/local/pgsql/bin/pg_dump db1" >> $DB1 gzip $DB1 Is it possible this could cause some strange behavior? This backup script has been running for a year (every night) w/o any trouble. Very strange. > From: Tom Lane <tgl@sss.pgh.pa.us> > Date: Fri, 05 Dec 2003 22:18:33 -0500 > To: Hunter Hillegas <lists@lastonepicked.com> > Cc: PostgreSQL <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] 7.4 Crashed... Why? > > Hunter Hillegas <lists@lastonepicked.com> writes: >> Here's the serverlog entry: >> LOG: recycled transaction log file "0000000000000028" >> FATAL: lock file "/usr/local/pgsql/data/postmaster.pid" already exists >> HINT: Is another postmaster (PID 1010) running in data directory >> "/usr/local/pgsql/data"? > >> Not much. Looks like it tried to restart itself, found the old pid file and >> crapped out... Or something. Why would it restart itself? > > The postmaster *never* restarts itself. What the above looks like to me > is some random script decided to try to start a new postmaster, and the > new postmaster quite properly refused to do anything because there > already was a running postmaster. You should look into your cron jobs > and see what sort of interesting stuff might lurk there. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html
Hunter Hillegas <lists@lastonepicked.com> writes: > I did find a cron job that was running in the suspect time... But all it > does is the following: > DATE=`date +%Y%m%d` > DB1=/root/database_backup/db1_db.$DATE > su - postgres -c "/usr/local/pgsql/bin/pg_dump db1" >> $DB1 > gzip $DB1 > Is it possible this could cause some strange behavior? This backup script > has been running for a year (every night) w/o any trouble. The cron script itself certainly looks unexceptional. But if "su - postgres" executes postgres' ~/.profile or other shell-startup scripts (I think it does so on some platforms but not others), maybe you had some weird behavior recently added to those scripts? regards, tom lane
Verified that no shell startup scripts are running... And the backup proceeded normally last night with no crash. I'm starting to think that I'm not going to be able to track this down and have to hope it doesn't happen again. > From: Tom Lane <tgl@sss.pgh.pa.us> > Date: Sat, 06 Dec 2003 20:36:39 -0500 > To: Hunter Hillegas <lists@lastonepicked.com> > Cc: PostgreSQL <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] 7.4 Crashed... Why? > > Hunter Hillegas <lists@lastonepicked.com> writes: >> I did find a cron job that was running in the suspect time... But all it >> does is the following: >> DATE=`date +%Y%m%d` >> DB1=/root/database_backup/db1_db.$DATE >> su - postgres -c "/usr/local/pgsql/bin/pg_dump db1" >> $DB1 >> gzip $DB1 >> Is it possible this could cause some strange behavior? This backup script >> has been running for a year (every night) w/o any trouble. > > The cron script itself certainly looks unexceptional. But if "su - postgres" > executes postgres' ~/.profile or other shell-startup scripts (I think it > does so on some platforms but not others), maybe you had some weird > behavior recently added to those scripts? > > regards, tom lane