Re: What in the world is happening with castoroides and protosciurus?

Поиск
Список
Период
Сортировка
От Dave Page
Тема Re: What in the world is happening with castoroides and protosciurus?
Дата
Msg-id CA+OCxowLSadWgQORdK=XPG6xoXQr8PgTFRfcCeZ_VHu1GpmvFw@mail.gmail.com
обсуждение исходный текст
Ответ на What in the world is happening with castoroides and protosciurus?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: What in the world is happening with castoroides and protosciurus?  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
On Tue, Aug 26, 2014 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> For the last month or so, these two buildfarm animals (which I believe are
> the same physical machine) have been erratically failing with errors that
> reflect low-order differences in floating-point calculations.
>
> A recent example is at
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=protosciurus&dt=2014-08-25%2010%3A39%3A52
>
> where the only regression diff is
>
> *** /export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/expected/hash_index.out       Mon
Aug25 11:41:00 2014
 
> --- /export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/results/hash_index.out        Mon
Aug25 11:57:26 2014
 
> ***************
> *** 171,179 ****
>   SELECT h.seqno AS i8096, h.random AS f1234_1234
>      FROM hash_f8_heap h
>      WHERE h.random = '-1234.1234'::float8;
> !  i8096 | f1234_1234
> ! -------+------------
> !   8906 | -1234.1234
>   (1 row)
>
>   UPDATE hash_f8_heap
> --- 171,179 ----
>   SELECT h.seqno AS i8096, h.random AS f1234_1234
>      FROM hash_f8_heap h
>      WHERE h.random = '-1234.1234'::float8;
> !  i8096 |    f1234_1234
> ! -------+-------------------
> !   8906 | -1234.12356777216
>   (1 row)
>
>   UPDATE hash_f8_heap
>
> ... a result that certainly makes no sense.  The results are not
> repeatable, failing in equally odd ways in different tests on different
> runs.  This is happening in all the back branches too, not just HEAD.
>
> Has there been a system software update on this machine a month or so ago?
> If not, it's hard to think anything except that the floating point
> hardware on this box has developed problems.

There hasn't been a software update, but something happened about two
months ago, and we couldn't get to the bottom of exactly what it was -
essentially, castoroides started failing with "C compiler cannot
create executables". It appeared that the compiler was missing from
the path, however the config hadn't changed. Our working theory is
that there was previously a symlink to the compiler in one of the
directories in the path, that somehow got removed. The issue was fixed
by adding the actual compiler location to the path.

However, that would have only affected castoroides, and not
protosciurus which runs under a different environment config. I have
no idea what is causing the current issue - the machine is stable
software-wise, and only has private builds of dependency libraries
update periodically (which are not used for the buildfarm). If I had
to hazard a guess, I'd suggest this is an early symptom of an old
machine which is starting to give up.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: postgresql latency & bgwriter not doing its job