Обсуждение: PG11 jit failing on ppc64el
PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases except Jessie are affected. My guess is on problems with llvm/jit, because of the C++ style error message (and LLVM is disabled on Jessie). Debian sid: 15:59:29 2018-05-22 13:59:24.914 UTC [29081] pg_regress/strings STATEMENT: SELECT chr(0); 15:59:29 terminate called after throwing an instance of 'std::bad_function_call' 15:59:29 what(): bad_function_call 15:59:29 2018-05-22 13:59:25.026 UTC [28961] LOG: server process (PID 29085) was terminated by signal 6: Aborted 15:59:29 2018-05-22 13:59:25.026 UTC [28961] DETAIL: Failed process was running: INSERT INTO TEMP_GROUP 15:59:29 SELECT 1, (- i.f1), (- f.f1) 15:59:29 FROM INT4_TBL i, FLOAT8_TBL f; 15:59:29 2018-05-22 13:59:25.026 UTC [28961] LOG: terminating any other active server processes 15:59:29 2018-05-22 13:59:25.026 UTC [29078] WARNING: terminating connection because of crash of another server process Debian stretch: 15:58:45 2018-05-22 13:58:43.778 UTC [29981] pg_regress/indexing STATEMENT: insert into fastpath values (1, 'b1', 100.00); 15:58:45 terminate called after throwing an instance of 'std::bad_function_call' 15:58:45 what(): bad_function_call 15:58:45 2018-05-22 13:58:43.975 UTC [28908] LOG: server process (PID 29981) was terminated by signal 6: Aborted 15:58:45 2018-05-22 13:58:43.975 UTC [28908] DETAIL: Failed process was running: select md5(string_agg(a::text, b orderby a, b asc)) from fastpath 15:58:45 where a >= 1000 and a < 2000 and b > 'b1' and b < 'b3'; 15:58:45 2018-05-22 13:58:43.975 UTC [28908] LOG: terminating any other active server processes 15:58:45 2018-05-22 13:58:43.975 UTC [30037] WARNING: terminating connection because of crash of another server process Christoph
Hi, On 2018-05-22 16:33:57 +0200, Christoph Berg wrote: > PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases except > Jessie are affected. My guess is on problems with llvm/jit, because of > the C++ style error message (and LLVM is disabled on Jessie). It was bug in LLVM that's fixed now. I guess you can either disable jit on arm or ask the LLVM maintainer to backport it... r328687 - but the expanded tests created a few problems (windows mainly, but somewhere else too), so I'd just backport the actual code change. - Andres
Re: Andres Freund 2018-05-22 <20180522151101.drsbh6p7ltxpmn65@alap3.anarazel.de> > Hi, > > On 2018-05-22 16:33:57 +0200, Christoph Berg wrote: > > PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases except > > Jessie are affected. My guess is on problems with llvm/jit, because of > > the C++ style error message (and LLVM is disabled on Jessie). > > It was bug in LLVM that's fixed now. I guess you can either disable jit > on arm or ask the LLVM maintainer to backport it... > > r328687 - but the expanded tests created a few problems (windows mainly, > but somewhere else too), so I'd just backport the actual code change. Thanks also for the extra details on IRC. I've disabled --with-llvm on all platforms except amd64 i386 now. Will try talking to the llvm maintainers in Debian to see if we can get this fixed and have more coverage. Christoph
On May 23, 2018 4:59:00 AM PDT, Christoph Berg <myon@debian.org> wrote: >Re: Andres Freund 2018-05-22 ><20180522151101.drsbh6p7ltxpmn65@alap3.anarazel.de> >> Hi, >> >> On 2018-05-22 16:33:57 +0200, Christoph Berg wrote: >> > PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases >except >> > Jessie are affected. My guess is on problems with llvm/jit, because >of >> > the C++ style error message (and LLVM is disabled on Jessie). >> >> It was bug in LLVM that's fixed now. I guess you can either disable >jit >> on arm or ask the LLVM maintainer to backport it... >> >> r328687 - but the expanded tests created a few problems (windows >mainly, >> but somewhere else too), so I'd just backport the actual code change. > >Thanks also for the extra details on IRC. > >I've disabled --with-llvm on all platforms except amd64 i386 now. Will >try talking to the llvm maintainers in Debian to see if we can get >this fixed and have more coverage. How about making that dependant on the llvm version being < 7? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Andres Freund 2018-05-23 <38F42310-62AC-48B1-8A83-639B97E5FA81@anarazel.de> > >I've disabled --with-llvm on all platforms except amd64 i386 now. Will > >try talking to the llvm maintainers in Debian to see if we can get > >this fixed and have more coverage. > > How about making that dependant on the llvm version being < 7? It does work on x86 for <7, so the architecture would still need to be coded into debian/control and/or debian/rules. Also, we can't depend on "llvm (>= 7 if it exists)"... Christoph
On 2018-05-23 22:45:26 +0200, Christoph Berg wrote: > Re: Andres Freund 2018-05-23 <38F42310-62AC-48B1-8A83-639B97E5FA81@anarazel.de> > > >I've disabled --with-llvm on all platforms except amd64 i386 now. Will > > >try talking to the llvm maintainers in Debian to see if we can get > > >this fixed and have more coverage. > > > > How about making that dependant on the llvm version being < 7? > > It does work on x86 for <7, so the architecture would still need to be > coded into debian/control and/or debian/rules. Also, we can't depend > on "llvm (>= 7 if it exists)"... What I meant was that I'd conditionally enable it for the other archs when the version is >= 7. Greetings, Andres Freund
Re: Andres Freund 2018-05-23 <20180523205521.mdzwldqabriupiz5@alap3.anarazel.de> > What I meant was that I'd conditionally enable it for the other archs > when the version is >= 7. Good idea, but unfortunately there's a bunch of architectures on ports.debian.org that llvm hasn't been ported to yet :(, so the architecture qualification on the dependencies is still necessary. Christoph
On Thu, May 24, 2018 at 9:00 AM, Christoph Berg <myon@debian.org> wrote: > Re: Andres Freund 2018-05-23 <20180523205521.mdzwldqabriupiz5@alap3.anarazel.de> >> What I meant was that I'd conditionally enable it for the other archs >> when the version is >= 7. > > Good idea, but unfortunately there's a bunch of architectures on > ports.debian.org that llvm hasn't been ported to yet :(, so the > architecture qualification on the dependencies is still necessary. BTW It is working on arm64 too, starting with LLVM 6. 5 crashed the same way as it does on ppc. See build farm member eelpout which is running Debian. -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > BTW It is working on arm64 too, starting with LLVM 6. 5 crashed the > same way as it does on ppc. See build farm member eelpout which is > running Debian. For entertainment's sake, I tried building --with-llvm on FreeBSD 12 arm64 (hey, gotta do something with this raspberry pi toy I got). I used llvm-devel-7.0.d20180327 which seems to be the latest available in FreeBSD's package system. Builds cleanly, does not work at all. SIGSEGV here: #0 __clear_cache (start=0x4c055000, end=0x4c0566ec) at /usr/src/contrib/compiler-rt/lib/builtins/clear_cache.c:168 #1 0x000000004bb78d8c in llvm::sys::Memory::protectMappedMemory(llvm::sys::MemoryBlock const&, unsigned int) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #2 0x000000004b68f020 in llvm::SectionMemoryManager::applyMemoryGroupPermissions(llvm::SectionMemoryManager::MemoryGroup&,unsigned int) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #3 0x000000004b68ef38 in llvm::SectionMemoryManager::finalizeMemory(std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> >*) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #4 0x000000004b85d310 in llvm::RuntimeDyld::finalizeWithMemoryManagerLocking() () from /home/tgl/installdir/lib/postgresql/llvmjit.so #5 0x000000004ad22c38 in llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObj ect<std::__1::shared_ptr<llvm::RuntimeDyld::MemoryManager> >::finalize() () from /home/tgl/installdir/lib/postgresql/llvmjit.so #6 0x000000004ad236ec in llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject<std::__1::shared_ptr<llvm::RuntimeDyld::MemoryManager> >::getSymbolMaterializer(std::__1::basic_string<char,std::__1::char_traits<char>, std::__1::allocator<char> >)::{lambda()#1}::operator()()const () from /home/tgl/installdir/lib/postgresql/llvmjit.so #7 0x000000004ad22084 in llvm::JITSymbol::getAddress() () from /home/tgl/installdir/lib/postgresql/llvmjit.so #8 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #9 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #10 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #11 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #12 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #13 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #14 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #15 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #16 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #17 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #18 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #19 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #20 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #21 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #22 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #23 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #24 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #25 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #26 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #27 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #28 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #29 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so #30 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char, std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () from /home/tgl/installdir/lib/postgresql/llvmjit.so ... etc etc ... Sure looks like infinite recursion in findSymbolAddress. Thoughts? regards, tom lane
On Thu, May 24, 2018 at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> BTW It is working on arm64 too, starting with LLVM 6. 5 crashed the >> same way as it does on ppc. See build farm member eelpout which is >> running Debian. > > For entertainment's sake, I tried building --with-llvm on FreeBSD 12 > arm64 (hey, gotta do something with this raspberry pi toy I got). Neat. Quite tempted to get one! > I used llvm-devel-7.0.d20180327 which seems to be the latest available in > FreeBSD's package system. Builds cleanly, does not work at all. > SIGSEGV here: > > [big ugly stack] > > Sure looks like infinite recursion in findSymbolAddress. Thoughts? Hmm. I just tried llvm-devel-7.0.d20180327 on my amd64 FreeBSD 12 system and our make check passed with flying colours. I guess there could be a bug in LLVM or the FreeBSD 12 linker or their interaction on ARM. Maybe the cycle somehow comes from lines 376 and 391 of this: https://github.com/llvm-mirror/llvm/blob/41d411071aefb16379415150d970171698b13ff9/lib/ExecutionEngine/Orc/OrcCBindingsStack.h I know that LocalIndirectStubsManager is instantiated differently on each architecture, but I couldn't immediately see how that could produce the cycle and I'm currently avoiding the LLVM-internals rabbit hole. Maybe Andres has an idea? -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Thu, May 24, 2018 at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> For entertainment's sake, I tried building --with-llvm on FreeBSD 12 >> arm64 (hey, gotta do something with this raspberry pi toy I got). > Neat. Quite tempted to get one! Fair warning: the newest "3B+" model contains new ethernet and wireless chips that none of the BSDen have drivers for yet. I ended up spending an extra $10 on a USB WiFi dongle so that I could get the thing on the network. Compared to the price of the RPI itself, seems like highway robbery. (But then again, the keyboard I plugged into it is worth more than the RPI, not to mention the monitor.) >> I used llvm-devel-7.0.d20180327 which seems to be the latest available in >> FreeBSD's package system. Builds cleanly, does not work at all. > Hmm. I just tried llvm-devel-7.0.d20180327 on my amd64 FreeBSD 12 > system and our make check passed with flying colours. Hmph. Looking closer, "does not work at all" is overly negative. It gets about halfway through the core regression tests and then crashes on one specific query in the "inherit" test: 2018-05-24 01:22:28.657 EDT [51790] LOG: server process (PID 52037) was terminated by signal 11: Segmentation fault 2018-05-24 01:22:28.657 EDT [51790] DETAIL: Failed process was running: select * from matest0 order by 1-id; Still trying to get more info on exactly where it's going off the rails --- gdb has got some problems with printing such deep stacks, and for some reason "ulimit -s" doesn't work to make the available stack space smaller. regards, tom lane