pg_basebackup from REL_12_STABLE hands on solaris/sparch

Поиск
Список
Период
Сортировка
От Victor Wagner
Тема pg_basebackup from REL_12_STABLE hands on solaris/sparch
Дата
Msg-id 20191001170403.7ab384cb@fafnir.local.vm
обсуждение исходный текст
Список pgsql-hackers
Collegues,

I've encountered following problem on some old Sparc64 machine running
solaris 10:

When I compile postgresql 12 with --enable-tap-tests and run make check
in src/bin, test src/bin/pg_basebackup/t/010_pg_basebackup.pl
hangs and hangs infinitely. 

I've tried to attach gdb to the hanging process, but it attempt to 
do backtrace in it, gdb reports that stack is corrupt

Attaching to program `/home/vitus/postgrespro/src/bin/pg_basebackup/pg_basebackup', process 1467
[New process 1467]
Retry #1:
Retry #2:
Retry #3:
Retry #4:
Reading symbols from /usr/lib/sparcv9/ld.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/sparcv9/ld.so.1
---Type <return> to continue, or q <return> to quit---
0x00000000ff2cca38 in ?? ()
(gdb) bt
#0  0x00000000ff2cca38 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


When afterword I kill hanged process with
kill -SEGV to get core, I get following stack trace from core file:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xffffffff7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
(gdb) bt
#0  0xffffffff7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
#1  0xffffffff7d5c8c8c in __sigtimedwait () from /lib/64/libc.so.1
#2  0xffffffff7d5c0628 in __posix_sigwait () from /lib/64/libc.so.1
#3  0xffffffff7f3362b4 in pq_reset_sigpipe (osigset=0xffffffff7fffeb1c,
sigpipe_pending=false, got_epipe=true) at fe-secure.c:529 #4
0xffffffff7f336084 in pqsecure_raw_write (conn=0x100135e90,
ptr=0x10013a370, len=5) at fe-secure.c:399 #5  0xffffffff7f335e28 in
pqsecure_write (conn=0x100135e90, ptr=0x10013a370, len=5) at
fe-secure.c:316 #6  0xffffffff7f326c54 in pqSendSome (conn=0x100135e90,
len=5) at fe-misc.c:876 #7  0xffffffff7f326e84 in pqFlush
(conn=0x100135e90) at fe-misc.c:1004 #8  0xffffffff7f316584 in
sendTerminateConn (conn=0x100135e90) at fe-connect.c:4031 #9
0xffffffff7f3165a4 in closePGconn (conn=0x100135e90) at
fe-connect.c:4049 #10 0xffffffff7f31663c in PQfinish (conn=0x100135e90)
at fe-connect.c:4083 #11 0x000000010000bc64 in BaseBackup () at
pg_basebackup.c:2136 #12 0x000000010000d7ec in main (argc=4,
argv=0xffffffff7ffff808) at pg_basebackup.c:2547

This happens on random tests in this test file with probablity about
1/10, but because there is more than 100 tests, hanging has 100%
probablity. But other two test files in src/bin/pg_basebackup directory
don't hang.

As far as I can notice, there is only two machines with Solaris in
pgbuildfarm now, and neither of them has any records of running
REL_12_STABLE branch. (not to mention that both don't run tap tests).

-- 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Optimize partial TOAST decompression