Обсуждение: Clang optimiser vs preproc.c
Hi, FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2: clang15: ~16s clang16: ~211s clang17: ~233s First noticed on FreeBSD (where the system cc is now clang16), but reproduced also on Debian (via apt.llvm.org packages).
Thomas Munro <thomas.munro@gmail.com> writes: > FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2: > clang15: ~16s > clang16: ~211s > clang17: ~233s What are the numbers for gram.c? regards, tom lane
On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2: > > > clang15: ~16s > > clang16: ~211s > > clang17: ~233s > > What are the numbers for gram.c? clang15: ~3.8s clang16: ~3.2s clang17: ~2.9s
Thomas Munro <thomas.munro@gmail.com> writes: > On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2: >>> clang15: ~16s >>> clang16: ~211s >>> clang17: ~233s >> What are the numbers for gram.c? > clang15: ~3.8s > clang16: ~3.2s > clang17: ~2.9s Huh. There's not that many more productions in the ecpg grammar than the core, so it doesn't seem likely that this is purely a size-of-file issue. I'd bet on there being something that clang doesn't do well about the (very stylized) C code being generated within the grammar productions. We actually noticed this or a closely-related problem before [1] and briefly discussed the possibility of rearranging the generated code to make it less indigestible to clang. But there was no concrete idea about what to do specifically, and the thread slid off the radar screen. regards, tom lane [1] https://www.postgresql.org/message-id/20220902174033.wgt6li7oiaxmwftn@awork3.anarazel.de
On Sat, Dec 16, 2023 at 4:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > We actually noticed this or a closely-related problem before [1] > and briefly discussed the possibility of rearranging the generated > code to make it less indigestible to clang. But there was no concrete > idea about what to do specifically, and the thread slid off the radar > screen. I've never paid attention to the output of -ftime-report before but this difference stands out pretty clearly with clang16: ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 201.2266 ( 99.6%) 0.0074 ( 99.3%) 201.2341 ( 99.6%) 207.1308 ( 99.6%) SLPVectorizerPass The equivalent line for clang15 is: 3.0979 ( 64.8%) 0.0000 ( 0.0%) 3.0979 ( 64.8%) 3.0996 ( 64.8%) SLPVectorizerPass The thing Andres showed in that other thread was like this (though in my output it's grown "#2") which is much of the time in 15, but "only" goes up by a couple of seconds in 16, so it's not our primary problem: 9.1890 ( 73.1%) 0.0396 ( 23.9%) 9.2286 ( 72.4%) 9.6586 ( 72.9%) Greedy Register Allocator #2
Hi, On 2023-12-15 22:19:56 -0500, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Thomas Munro <thomas.munro@gmail.com> writes: > >>> FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2: > >>> clang15: ~16s > >>> clang16: ~211s > >>> clang17: ~233s Is this with non-assert clangs? Because I see times that seem smaller by more than what can be explained by hardware differences: preproc.c: 17 10.270s 16 9.685s 15 8.300s gram.c: 17 1.936s 16 2.131s 15 2.161s That's still bad, but a far cry away from 233s. > Huh. There's not that many more productions in the ecpg grammar > than the core, so it doesn't seem likely that this is purely a > size-of-file issue. I'd bet on there being something that clang > doesn't do well about the (very stylized) C code being generated > within the grammar productions. > > We actually noticed this or a closely-related problem before [1] > and briefly discussed the possibility of rearranging the generated > code to make it less indigestible to clang. But there was no concrete > idea about what to do specifically, and the thread slid off the radar > screen. One interest aspect might be that preproc.c ends up with ~33% more states than gram.c gram.c: #define YYLAST 116587 preproc.c: #define YYLAST 155495 Greetings, Andres Freund
On Sun, Dec 17, 2023 at 1:29 AM Andres Freund <andres@anarazel.de> wrote: > On 2023-12-15 22:19:56 -0500, Tom Lane wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> Thomas Munro <thomas.munro@gmail.com> writes: > > >>> FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2: > > >>> clang15: ~16s > > >>> clang16: ~211s > > >>> clang17: ~233s > > Is this with non-assert clangs? Because I see times that seem smaller by more > than what can be explained by hardware differences: > > preproc.c: > 17 10.270s > 16 9.685s > 15 8.300s > > gram.c: > 17 1.936s > 16 2.131s > 15 2.161s > > That's still bad, but a far cry away from 233s. Hrmph. Well something weird is going on, but it might indeed involve me being confused about debug options of the compiler itself. How can one find out which build options were used for clang/llvm compiler + libraries? My earlier reports were from a little machine at home, so let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running Debian 12, again using packages from apt.llvm.org: 17 ~198s 16 ~14s 15 ~11s OK so even if we ignore the wild outlier it is getting significantly slower. But... huh, there goes the big jump, but at a different version than I saw with FBSD's packages. Here's what perf says it's doing: + 99.42% 20.12% clang-17 libLLVM-17.so.1 [.] llvm::slpvectorizer::BoUpSLP::getTreeCost ◆ + 96.91% 0.00% clang-17 libLLVM-17.so.1 [.] llvm::SLPVectorizerPass::runImpl ▒ + 96.91% 0.00% clang-17 libLLVM-17.so.1 [.] llvm::SLPVectorizerPass::vectorizeChainsInBlock ▒ + 96.91% 0.00% clang-17 libLLVM-17.so.1 [.] llvm::SLPVectorizerPass::vectorizeSimpleInstructions ▒ + 96.91% 0.00% clang-17 libLLVM-17.so.1 [.] llvm::SLPVectorizerPass::vectorizeInsertElementInst ▒ + 96.91% 0.00% clang-17 libLLVM-17.so.1 [.] llvm::SLPVectorizerPass::tryToVectorizeList ▒ + 73.79% 0.00% clang-17 libLLVM-17.so.1 [.] 0x00007fbead445cb0 ▒ + 36.88% 36.88% clang-17 libLLVM-17.so.1 [.] 0x0000000001e45cda ▒ + 3.95% 3.95% clang-17 libLLVM-17.so.1 [.] 0x0000000001e45d11
On Tue, Dec 19, 2023 at 11:42 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Hrmph. Well something weird is going on, but it might indeed involve > me being confused about debug options of the compiler itself. How can > one find out which build options were used for clang/llvm compiler + > libraries? My earlier reports were from a little machine at home, so > let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running > Debian 12, again using packages from apt.llvm.org: > > 17 ~198s > 16 ~14s > 15 ~11s And on another Debian machine (this time a VM) also using apt.llvm.org packages, the huge ~3 minute time occurs with clang-16... hrrrnnnff... seems like there must be some other variable here that I haven't spotted yet...
Hello Thomas, 19.12.2023 07:20, Thomas Munro wrote: > On Tue, Dec 19, 2023 at 11:42 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> Hrmph. Well something weird is going on, but it might indeed involve >> me being confused about debug options of the compiler itself. How can >> one find out which build options were used for clang/llvm compiler + >> libraries? My earlier reports were from a little machine at home, so >> let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running >> Debian 12, again using packages from apt.llvm.org: >> >> 17 ~198s >> 16 ~14s >> 15 ~11s > And on another Debian machine (this time a VM) also using apt.llvm.org > packages, the huge ~3 minute time occurs with clang-16... hrrrnnnff... > seems like there must be some other variable here that I haven't > spotted yet... Reproduced here, with clang-16 and clang-17 (on Ubuntu 22.04, on Fedora 39). Namely, I tried on Ubuntu clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04 and clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04 from https://github.com/llvm/llvm-project/releases, as follows: PATH=".../clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin:$PATH" CC=clang-17 CPPFLAGS="-O2" sh -c "./configure -q --enable-debug --enable-cassert && time make >make.log" and then PATH=".../clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin:$PATH" sh -c "cd src/interfaces/ecpg/preproc; time clang-17 -v -g -I../include -I../../../../src/interfaces/ecpg/include -I. -I../../../../src/interfaces/ecpg/ecpglib -I../../../../src/interfaces/libpq -I../../../../src/include -O2 -D_GNU_SOURCE -c -o preproc.o preproc.c" 144.59user 0.25system 2:24.94elapsed 99%CPU (0avgtext+0avgdata 217320maxresident)k The same is observed with clang-16 (16.0.6 20231112100510+7cbf1a259152...) installed from http://apt.llvm.org/jammy/. (Adding parameters -fno-slp-vectorize or -mllvm -slp-threshold=100000 or -mllvm -slp-max-vf=3 decreases/normalizes compilation time.) On a fresh vagrant image "fedora/39-cloud-base", I tried versions 16.0.0~rc4, 17.0.0~rc3, 17.0.6 downloaded from https://koji.fedoraproject.org/koji/packageinfo?packageID=21848 https://koji.fedoraproject.org/koji/packageinfo?packageID=5646 All of them also give 2+ minutes for me. But I see no slowdown with version 15.0.7 on the same VM. Also, I see no issue with clang-18, installed on Ubuntu from apt.llvm.org. So, as far as I can see, this anomaly started from clang-16, and ended with clang-18. Comparing histories of SLPVectorizer.cpp: https://github.com/llvm/llvm-project/commits/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp https://github.com/llvm/llvm-project/commits/release/17.x/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp I see a commit that probably could fix the issue in the master branch (for clang-18): [SLP][NFC]Improve compile time by storing all nodes for the given ... Though I still can't explain how you get ~14s with clang-16. Could you show the exact sequence of commands you use to measure the duration? Best regards, Alexander