Re: stress test for parallel workers

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: stress test for parallel workers
Дата
Msg-id 27924.1571068231@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: stress test for parallel workers  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: stress test for parallel workers  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
I wrote:
> Filed at
> https://bugzilla.kernel.org/show_bug.cgi?id=205183
> We'll see what happens ...

Further to this --- I went back and looked at the outlier events
where we saw an infinite_recurse failure on a non-Linux-PPC64
platform.  There were only three:

 mereswine    | ARMv7            | Linux debian-armhf | Clarence Ho     | REL_11_STABLE | 2019-08-11 02:10:12 |
InstallCheck-C | 2019-08-11 02:36:10.159 PDT [5004:4] DETAIL:  Failed process was running: select infinite_recurse(); 
 mereswine    | ARMv7            | Linux debian-armhf | Clarence Ho     | REL_12_STABLE | 2019-08-11 09:52:46 |
pg_upgradeCheck| 2019-08-11 04:21:16.756 PDT [6804:5] DETAIL:  Failed process was running: select infinite_recurse(); 
 mereswine    | ARMv7            | Linux debian-armhf | Clarence Ho     | HEAD          | 2019-08-11 11:29:27 |
pg_upgradeCheck| 2019-08-11 07:15:28.454 PDT [9954:76] DETAIL:  Failed process was running: select infinite_recurse(); 

Looking closer at these, though, they were *not* SIGSEGV failures,
but SIGKILLs.  Seeing that they were all on the same machine on the
same day, I'm thinking we can write them off as a transiently
misconfigured OOM killer.

So, pending some other theory emerging from the kernel hackers, we're
down to it's-a-PPC64-kernel-bug.  That leaves me wondering what if
anything we want to do about it.  Even if it's fixed reasonably promptly
in Linux upstream, and then we successfully nag assorted vendors to
incorporate the fix quickly, that's still going to leave us with frequent
buildfarm failures on Mark's flotilla of not-the-very-shiniest Linux
versions.

Should we move the infinite_recurse test to happen alone in a parallel
group just to stop these failures?  That's annoying from a parallelism
standpoint, but I don't see any other way to avoid these failures.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Dilger
Дата:
Сообщение: Re: Fix most -Wundef warnings
Следующее
От: vignesh C
Дата:
Сообщение: Re: Non-Active links being referred in our source code