Обсуждение: Repeated pg_upgrade buildfarm failures on binturon

Поиск
Список
Период
Сортировка

Repeated pg_upgrade buildfarm failures on binturon

От
Andres Freund
Дата:
Hi,

Binturon has repeatedly failed with errors like:
ERROR:  could not open file "base/16400/32052": No such file or directory

E.g.
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturong&dt=2015-07-06%2014%3A20%3A24

It's not just master that's failing, even older branches report odd
errors:
connection to database failed: FATAL:  could not open relation mapping file "global/pg_filenode.map": No such file or
directory
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturong&dt=2015-07-06%2014%3A53%3A10

Greetings,

Andres Freund



Re: Repeated pg_upgrade buildfarm failures on binturon

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Binturon has repeatedly failed with errors like:
> ERROR:  could not open file "base/16400/32052": No such file or directory

I agree that binturong seems to have something odd going on; but there are
a lot of other intermittent pg_upgrade test failures in the buildfarm
history, and the general situation is that you can't tell what actually
happened because the buildfarm script doesn't capture all the relevant log
files.  It would be real nice to improve that, but I lack the necessary
Perl-fu.
        regards, tom lane



Re: Repeated pg_upgrade buildfarm failures on binturon

От
Andres Freund
Дата:
On 2015-07-06 20:00:43 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Binturon has repeatedly failed with errors like:
> > ERROR:  could not open file "base/16400/32052": No such file or directory
> 
> I agree that binturong seems to have something odd going on; but there are
> a lot of other intermittent pg_upgrade test failures in the buildfarm
> history

binturong seemed to be clean on HEAD for a while now, and the failures
~80 days ago seem to have had different symptoms (the src/bin move):
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=binturong&br=HEAD

other branches are less nice looking for various reasons, but there's
another recurring error:
FATAL:  could not open relation mapping file "global/pg_filenode.map": No such file or directory

Those seem to indicate something going seriously wrong to me.



Re: Repeated pg_upgrade buildfarm failures on binturon

От
Oskari Saarenmaa
Дата:
07.07.2015, 14:21, Andres Freund kirjoitti:
> On 2015-07-06 20:00:43 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> Binturon has repeatedly failed with errors like:
>>> ERROR:  could not open file "base/16400/32052": No such file or directory
>>
>> I agree that binturong seems to have something odd going on; but there are
>> a lot of other intermittent pg_upgrade test failures in the buildfarm
>> history
>
> binturong seemed to be clean on HEAD for a while now, and the failures
> ~80 days ago seem to have had different symptoms (the src/bin move):
> http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=binturong&br=HEAD
>
> other branches are less nice looking for various reasons, but there's
> another recurring error:
> FATAL:  could not open relation mapping file "global/pg_filenode.map": No such file or directory
>
> Those seem to indicate something going seriously wrong to me.

Binturong and Dingo run on the same host with a hourly cronjob to
trigger the builds.  These failures are caused by concurrent test runs
on different branches which use the same tmp_check directory for
pg_upgrade tests, see
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dingo&dt=2015-07-07%2002%3A58%3A01&stg=check-pg_upgrade

It looks like neither make (GNU Make 4.0) nor shell (default Solaris
/bin/sh) updates $PWD to point to the current directory where test.sh is
executed and test.sh puts the test cluster in the original working
directory of the process that launched make.

I've restricted builds to one at a time on that host to work around this
issue for now.  Also attached a patch to explicitly set PWD=$(CURDIR) in
the Makefile to make sure test.sh runs with the right directory.

/ Oskari

Вложения

Re: Repeated pg_upgrade buildfarm failures on binturon

От
Tom Lane
Дата:
Oskari Saarenmaa <os@ohmu.fi> writes:
> 07.07.2015, 14:21, Andres Freund kirjoitti:
>> Those seem to indicate something going seriously wrong to me.

> Binturong and Dingo run on the same host with a hourly cronjob to
> trigger the builds.  These failures are caused by concurrent test runs
> on different branches which use the same tmp_check directory for
> pg_upgrade tests, see
> http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dingo&dt=2015-07-07%2002%3A58%3A01&stg=check-pg_upgrade

Ouch.

> It looks like neither make (GNU Make 4.0) nor shell (default Solaris
> /bin/sh) updates $PWD to point to the current directory where test.sh is
> executed and test.sh puts the test cluster in the original working
> directory of the process that launched make.

Double ouch.  It's the responsibility of the shell, not gmake, that PWD
reflect reality.  POSIX 2008, under Shell Variables, quoth as follows:

PWD Set by the shell and by the cd utility. In the shell the value shall be initialized from the environment as
follows.If a value for PWD is passed to the shell in the environment when it is executed, the value is an absolute
pathnameof the current working directory that is no longer than {PATH_MAX} bytes including the terminating null byte,
andthe value does not contain any components that are dot or dot-dot, then the shell shall set PWD to the value from
theenvironment. Otherwise, if a value for PWD is passed to the shell in the environment when it is executed, the value
isan absolute pathname of the current working directory, and the value does not contain any components that are dot or
dot-dot,then it is unspecified whether the shell sets PWD to the value from the environment or sets PWD to the pathname
thatwould be output by pwd -P. Otherwise, the sh utility sets PWD to the pathname that would be output by pwd -P. In
caseswhere PWD is set to the value from the environment, the value can contain components that refer to files of type
symboliclink. In cases where PWD is set to the pathname that would be output by pwd -P, if there is insufficient
permissionon the current working directory, or on any parent of that directory, to determine what that pathname would
be,the value of PWD is unspecified. Assignments to this variable may be ignored. If an application sets or unsets the
valueof PWD, the behaviors of the cd and pwd utilities are unspecified.
 

On the other hand, there is no text at all about PWD in the predecessor
Single Unix Spec v2, which is what we frequently regard as our minimum
baseline.  So one could argue that the Solaris shell you're using is
a valid implementation of SUS v2.

> I've restricted builds to one at a time on that host to work around this
> issue for now.  Also attached a patch to explicitly set PWD=$(CURDIR) in
> the Makefile to make sure test.sh runs with the right directory.

Given the last sentence in the POSIX 2008 text, I think unconditionally
munging PWD as you're proposing is a bit risky.  What I suggest is that
we add code to set PWD only if it's not set, which is most easily done
in test.sh itself, along the lines of
# Very old shells may not set PWD for us.if [ x"$PWD" = x"" ]; then  PWD=`pwd -P`fi

A quick look around says that pg_upgrade/test.sh is the only place where
we're depending on shell PWD, so we only need to fix this one script.
        regards, tom lane



Re: Repeated pg_upgrade buildfarm failures on binturon

От
Tom Lane
Дата:
I wrote:
> Given the last sentence in the POSIX 2008 text, I think unconditionally
> munging PWD as you're proposing is a bit risky.  What I suggest is that
> we add code to set PWD only if it's not set, which is most easily done
> in test.sh itself, along the lines of

>     # Very old shells may not set PWD for us.
>     if [ x"$PWD" = x"" ]; then
>       PWD=`pwd -P`
>     fi

Oh, wait, scratch that: the build logs you showed clearly indicate that
the test is running with temp_root set to
/export/home/pgfarmer/build-farm/tmp_check
which implies that PWD was not empty but "/export/home/pgfarmer/build-farm".
So the above wouldn't fix it.

A likely hypothesis is that the buildfarm script was invoked using some
modern shell that did set PWD, but then test.sh is being executed (in a
much lower directory) by some SUSv2-era shell that doesn't.

I'm still kind of afraid to explicitly change PWD in a modern shell,
though.  Perhaps the right thing is just not to rely on PWD at all
in test.sh, but replace $PWD with `pwd -P`.  (I did check that this
utility is required by SUSv2.)
        regards, tom lane



Re: Repeated pg_upgrade buildfarm failures on binturon

От
Tom Lane
Дата:
Oskari Saarenmaa <os@ohmu.fi> writes:
> I've restricted builds to one at a time on that host to work around this
> issue for now.  Also attached a patch to explicitly set PWD=$(CURDIR) in
> the Makefile to make sure test.sh runs with the right directory.

I've pushed a patch for this issue.  Please revert your buildfarm
configuration so that we can verify it works now.
        regards, tom lane



Re: Repeated pg_upgrade buildfarm failures on binturon

От
Oskari Saarenmaa
Дата:
07.07.2015, 19:50, Tom Lane kirjoitti:
> Oskari Saarenmaa <os@ohmu.fi> writes:
>> I've restricted builds to one at a time on that host to work around this
>> issue for now.  Also attached a patch to explicitly set PWD=$(CURDIR) in
>> the Makefile to make sure test.sh runs with the right directory.
> 
> I've pushed a patch for this issue.  Please revert your buildfarm
> configuration so that we can verify it works now.

Ok, just reverted the configuration change and started two test runs,
they're now using correct directories.

Thanks!
Oskari