Обсуждение: 8.04 and RedHat/CentOS init script issue
Hi, I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 and while booting the init script reports that the daemon [FAILED], but after I logon it shows the postmaster running and I am able to connect from any client remotely. I made not modifcations to the script and there is nothing out of the ordinary in the log. Thanks, Tony
Hi, On Tue, 18 Oct 2005, Tony Caduto wrote: > I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 and > while booting the init script reports that the daemon [FAILED], but after I > logon it shows the postmaster running and I am able to connect from any > client remotely. > > I made not modifcations to the script and there is nothing out of the > ordinary in the log. Hmm. In 8.0.4 RPM init scripts, we were using a 1 second of sleep time (see sleep 1 line in the init script). On some cases where the system is slow, you are prompted about the startup failure; however this is not the real case. In 8.1 RPMs, the sleep time was increased to 2 seconds; which we believe that won't have the problem you've reported: http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgsqlrpms/patches/8.1/postgresql.init?rev=1.2&content-type=text/x-cvsweb-markup So please increase this sleep time and give another try. Regards, -- Devrim GUNDUZ Kivi Bilişim Teknolojileri - http://www.kivi.com.tr devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org
Hi all, I tried changing the sleep command in the script to 2, but at boot it still says [FAILED]. even though the script reports it failed, the db is up an running. System is a Compaq DL380(2.5gb ram 2.4 dual 2.4gzh Xeon) running CentOS 4.2 I am going to install 8.1beta 3 on another box that is the exact same hardware and OS version, I will report back what happens. Not sure what is going on, has anyone else had this problem with CentOS 4.2 or Red Had EL 4.2? Thanks, Tony Caduto http://www.amsoftwaredesign.com Home of PG Lightning Admin for Postgresql 8.x
Tony Caduto <tony_caduto@amsoftwaredesign.com> writes: > I tried changing the sleep command in the script to 2, but at boot it > still says [FAILED]. > even though the script reports it failed, the db is up an running. This seems to happen for some people and not others. I've been wanting to find out how the heck it can take multiple seconds for the postmaster to start and create its pid-file ... that shouldn't take long at all. Are you willing to try strace'ing the postmaster? Modify the script like $SU -l postgres -c "strace -tt -o /tmp/strace.out $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG"2>&1 < /dev/null ^^^^^^^^^^^^^^ add this ^^^^^^ and reboot. (After you've gotten a trace of a failing case, change it back and reboot again.) This is kind of invasive and may change the behavior enough that we don't see the problem :-( --- but if you're willing to reboot a few times in hopes of capturing a trace of a failed case, it'd be worth trying. regards, tom lane
Tom Lane wrote: >Tony Caduto <tony_caduto@amsoftwaredesign.com> writes: > > >>I tried changing the sleep command in the script to 2, but at boot it >>still says [FAILED]. >>even though the script reports it failed, the db is up an running. >> >> > >This seems to happen for some people and not others. I've been wanting >to find out how the heck it can take multiple seconds for the postmaster >to start and create its pid-file ... that shouldn't take long at all. >Are you willing to try strace'ing the postmaster? Modify the script >like > > $SU -l postgres -c "strace -tt -o /tmp/strace.out $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG"2>&1 < /dev/null > ^^^^^^^^^^^^^^ add this ^^^^^^ > >and reboot. (After you've gotten a trace of a failing case, change it >back and reboot again.) > >This is kind of invasive and may change the behavior enough that we >don't see the problem :-( --- but if you're willing to reboot a few >times in hopes of capturing a trace of a failed case, it'd be worth >trying. > > regards, tom lane > > > Hi Tom, I added the strace line like you said and rebooted, it did display the [FAILED] after the reboot. I put the resulting strace.out file on my web server, here is the link(warning it's petty big): http://www.amsoftwaredesign.com/downloads/strace.out After the second reboot I changed the sleep from 2 to 5 and then it worked correctly, of course this really slowed the boot process. Thanks, Tony
Tony Caduto <tony_caduto@amsoftwaredesign.com> writes: > Tom Lane wrote: >> Are you willing to try strace'ing the postmaster? > I added the strace line like you said and rebooted, it did display the > [FAILED] after the reboot. Thanks for collecting the raw data. The salient events seem to be these: 12:57:52.400888 exec() call 12:57:52.619268 completion(?) of opening shared libraries 12:57:52.657465 first call coming from our own code instead of libraries 12:57:52.902476 begin reading postgresql.conf 12:57:52.915949 done reading postgresql.conf 12:57:52.916191 begin trying to identify system timezone 12:58:01.117869 done identifying system timezone 12:58:01.131798 postmaster.pid created In short: pg_timezone_initialize() took about 8.2 seconds out of the total time of 8.73 seconds. Since pg_timezone_initialize() needs to scan all of the 500-odd files under postgresql/share/timezone/, it isn't so surprising that it would take a little bit of time. But 8 seconds seems like a lot. The trace makes it look like localtime() performs stat("/etc/localtime") on each call, which is pretty ugly --- I wonder if there isn't some way around that? Anyway, the short answer is that pg_timezone_initialize ought to wait till after we've created postmaster.pid. There's no urgent reason to do it earlier AFAICS. This also explains why we didn't see a startup problem in earlier releases --- pg_timezone_initialize didn't exist before 8.0. regards, tom lane
Tom Lane wrote: > >In short: pg_timezone_initialize() took about 8.2 seconds out of the >total time of 8.73 seconds. > >Since pg_timezone_initialize() needs to scan all of the 500-odd files >under postgresql/share/timezone/, it isn't so surprising that it would >take a little bit of time. But 8 seconds seems like a lot. The trace >makes it look like localtime() performs stat("/etc/localtime") on each >call, which is pretty ugly --- I wonder if there isn't some way around >that? > > > > Further data points: I just observed this taking over 20 seconds on my clunky old pII 266. That's really horrible. But pg_ctl -w start was able to complete in about 2 seconds. Even on my much faster laptop the timezone lib startup took 3 or 4 seconds (and pg_ctl -w start came back in about 1 second). cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> In short: pg_timezone_initialize() took about 8.2 seconds out of the >> total time of 8.73 seconds. > Further data points: > I just observed this taking over 20 seconds on my clunky old pII 266. > That's really horrible. But pg_ctl -w start was able to complete in > about 2 seconds. Yeah. I've been experimenting here, and it's clear that strace itself adds huge overhead --- on my machine, postmaster start is normally well under a second, but strace'ing it brings it to about 8 seconds. No doubt that's because of all the stat("/etc/localtime") calls it has to trace. So there's some Heisenberg effect here. However, I don't think there can be much doubt that on a machine that is just booting (and has surely got none of these files in cache) the search through share/postgresql/timezone could take a few seconds. Hindsight is always 20/20 ;-) regards, tom lane
Tom Lane wrote: >So there's some Heisenberg effect here. However, I don't think there >can be much doubt that on a machine that is just booting (and has >surely got none of these files in cache) the search through >share/postgresql/timezone could take a few seconds. Hindsight is >always 20/20 ;-) > > Something is surely wrong in the timezone lib, though: [andrew@alphonso inst]$ grep /etc/localtime strace.out | wc -l 38073 cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Something is surely wrong in the timezone lib, though: [ digs in glibc sources for awhile... ] The test loop in score_timezone() calls both localtime() and strftime() for each probe point, and in glibc strftime() calls tzset(), which the source code claims is required by POSIX. The explicit tzset() call is what's forcing the recheck of /etc/localtime. Possibly the glibc boys would listen to a suggestion that strftime() need not force the file recheck, but my experience with them is that they're relatively impervious to suggestions :-( I'm not actually particularly worried about the startup time. What's bothering me right at the moment, given the new-found knowledge that strftime() is slow on Linux, is that we're using it in elog(). At the time that code was written, we did it deliberately to ensure that all the backends would write log timestamps in the same timezone regardless of local SET TimeZone commands. That's still an important consideration, but I wonder whether we don't now have enough timezone infrastructure that we could get the same results using pg_strftime. regards, tom lane
I wrote: > Possibly the glibc boys would listen to a suggestion that strftime() > need not force the file recheck, but my experience with them is that > they're relatively impervious to suggestions :-( I've filed a bug for this: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171351 so no need for everyone else to do it too ... > I'm not actually particularly worried about the startup time. What's > bothering me right at the moment, given the new-found knowledge that > strftime() is slow on Linux, is that we're using it in elog(). At the > time that code was written, we did it deliberately to ensure that all > the backends would write log timestamps in the same timezone regardless > of local SET TimeZone commands. That's still an important > consideration, but I wonder whether we don't now have enough timezone > infrastructure that we could get the same results using pg_strftime. If glibc fixes the problem upstream then we can leave well enough alone, but if they indicate they won't then we should think about doing this someday. The major problem with it probably is "what do you do when messages need to be emitted before pgtz has been initialized?" regards, tom lane