Обсуждение: Repeated pg_upgrade buildfarm failures on binturon
Hi, Binturon has repeatedly failed with errors like: ERROR: could not open file "base/16400/32052": No such file or directory E.g. http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturong&dt=2015-07-06%2014%3A20%3A24 It's not just master that's failing, even older branches report odd errors: connection to database failed: FATAL: could not open relation mapping file "global/pg_filenode.map": No such file or directory http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturong&dt=2015-07-06%2014%3A53%3A10 Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Binturon has repeatedly failed with errors like: > ERROR: could not open file "base/16400/32052": No such file or directory I agree that binturong seems to have something odd going on; but there are a lot of other intermittent pg_upgrade test failures in the buildfarm history, and the general situation is that you can't tell what actually happened because the buildfarm script doesn't capture all the relevant log files. It would be real nice to improve that, but I lack the necessary Perl-fu. regards, tom lane
On 2015-07-06 20:00:43 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Binturon has repeatedly failed with errors like: > > ERROR: could not open file "base/16400/32052": No such file or directory > > I agree that binturong seems to have something odd going on; but there are > a lot of other intermittent pg_upgrade test failures in the buildfarm > history binturong seemed to be clean on HEAD for a while now, and the failures ~80 days ago seem to have had different symptoms (the src/bin move): http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=binturong&br=HEAD other branches are less nice looking for various reasons, but there's another recurring error: FATAL: could not open relation mapping file "global/pg_filenode.map": No such file or directory Those seem to indicate something going seriously wrong to me.
07.07.2015, 14:21, Andres Freund kirjoitti: > On 2015-07-06 20:00:43 -0400, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> Binturon has repeatedly failed with errors like: >>> ERROR: could not open file "base/16400/32052": No such file or directory >> >> I agree that binturong seems to have something odd going on; but there are >> a lot of other intermittent pg_upgrade test failures in the buildfarm >> history > > binturong seemed to be clean on HEAD for a while now, and the failures > ~80 days ago seem to have had different symptoms (the src/bin move): > http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=binturong&br=HEAD > > other branches are less nice looking for various reasons, but there's > another recurring error: > FATAL: could not open relation mapping file "global/pg_filenode.map": No such file or directory > > Those seem to indicate something going seriously wrong to me. Binturong and Dingo run on the same host with a hourly cronjob to trigger the builds. These failures are caused by concurrent test runs on different branches which use the same tmp_check directory for pg_upgrade tests, see http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dingo&dt=2015-07-07%2002%3A58%3A01&stg=check-pg_upgrade It looks like neither make (GNU Make 4.0) nor shell (default Solaris /bin/sh) updates $PWD to point to the current directory where test.sh is executed and test.sh puts the test cluster in the original working directory of the process that launched make. I've restricted builds to one at a time on that host to work around this issue for now. Also attached a patch to explicitly set PWD=$(CURDIR) in the Makefile to make sure test.sh runs with the right directory. / Oskari
Вложения
Oskari Saarenmaa <os@ohmu.fi> writes: > 07.07.2015, 14:21, Andres Freund kirjoitti: >> Those seem to indicate something going seriously wrong to me. > Binturong and Dingo run on the same host with a hourly cronjob to > trigger the builds. These failures are caused by concurrent test runs > on different branches which use the same tmp_check directory for > pg_upgrade tests, see > http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dingo&dt=2015-07-07%2002%3A58%3A01&stg=check-pg_upgrade Ouch. > It looks like neither make (GNU Make 4.0) nor shell (default Solaris > /bin/sh) updates $PWD to point to the current directory where test.sh is > executed and test.sh puts the test cluster in the original working > directory of the process that launched make. Double ouch. It's the responsibility of the shell, not gmake, that PWD reflect reality. POSIX 2008, under Shell Variables, quoth as follows: PWD Set by the shell and by the cd utility. In the shell the value shall be initialized from the environment as follows.If a value for PWD is passed to the shell in the environment when it is executed, the value is an absolute pathnameof the current working directory that is no longer than {PATH_MAX} bytes including the terminating null byte, andthe value does not contain any components that are dot or dot-dot, then the shell shall set PWD to the value from theenvironment. Otherwise, if a value for PWD is passed to the shell in the environment when it is executed, the value isan absolute pathname of the current working directory, and the value does not contain any components that are dot or dot-dot,then it is unspecified whether the shell sets PWD to the value from the environment or sets PWD to the pathname thatwould be output by pwd -P. Otherwise, the sh utility sets PWD to the pathname that would be output by pwd -P. In caseswhere PWD is set to the value from the environment, the value can contain components that refer to files of type symboliclink. In cases where PWD is set to the pathname that would be output by pwd -P, if there is insufficient permissionon the current working directory, or on any parent of that directory, to determine what that pathname would be,the value of PWD is unspecified. Assignments to this variable may be ignored. If an application sets or unsets the valueof PWD, the behaviors of the cd and pwd utilities are unspecified. On the other hand, there is no text at all about PWD in the predecessor Single Unix Spec v2, which is what we frequently regard as our minimum baseline. So one could argue that the Solaris shell you're using is a valid implementation of SUS v2. > I've restricted builds to one at a time on that host to work around this > issue for now. Also attached a patch to explicitly set PWD=$(CURDIR) in > the Makefile to make sure test.sh runs with the right directory. Given the last sentence in the POSIX 2008 text, I think unconditionally munging PWD as you're proposing is a bit risky. What I suggest is that we add code to set PWD only if it's not set, which is most easily done in test.sh itself, along the lines of # Very old shells may not set PWD for us.if [ x"$PWD" = x"" ]; then PWD=`pwd -P`fi A quick look around says that pg_upgrade/test.sh is the only place where we're depending on shell PWD, so we only need to fix this one script. regards, tom lane
I wrote: > Given the last sentence in the POSIX 2008 text, I think unconditionally > munging PWD as you're proposing is a bit risky. What I suggest is that > we add code to set PWD only if it's not set, which is most easily done > in test.sh itself, along the lines of > # Very old shells may not set PWD for us. > if [ x"$PWD" = x"" ]; then > PWD=`pwd -P` > fi Oh, wait, scratch that: the build logs you showed clearly indicate that the test is running with temp_root set to /export/home/pgfarmer/build-farm/tmp_check which implies that PWD was not empty but "/export/home/pgfarmer/build-farm". So the above wouldn't fix it. A likely hypothesis is that the buildfarm script was invoked using some modern shell that did set PWD, but then test.sh is being executed (in a much lower directory) by some SUSv2-era shell that doesn't. I'm still kind of afraid to explicitly change PWD in a modern shell, though. Perhaps the right thing is just not to rely on PWD at all in test.sh, but replace $PWD with `pwd -P`. (I did check that this utility is required by SUSv2.) regards, tom lane
Oskari Saarenmaa <os@ohmu.fi> writes: > I've restricted builds to one at a time on that host to work around this > issue for now. Also attached a patch to explicitly set PWD=$(CURDIR) in > the Makefile to make sure test.sh runs with the right directory. I've pushed a patch for this issue. Please revert your buildfarm configuration so that we can verify it works now. regards, tom lane
07.07.2015, 19:50, Tom Lane kirjoitti: > Oskari Saarenmaa <os@ohmu.fi> writes: >> I've restricted builds to one at a time on that host to work around this >> issue for now. Also attached a patch to explicitly set PWD=$(CURDIR) in >> the Makefile to make sure test.sh runs with the right directory. > > I've pushed a patch for this issue. Please revert your buildfarm > configuration so that we can verify it works now. Ok, just reverted the configuration change and started two test runs, they're now using correct directories. Thanks! Oskari