Обсуждение: Clang optimiser vs preproc.c

Поиск
Список
Период
Сортировка

Clang optimiser vs preproc.c

От
Thomas Munro
Дата:
Hi,

FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:

clang15: ~16s
clang16: ~211s
clang17: ~233s

First noticed on FreeBSD (where the system cc is now clang16), but
reproduced also on Debian (via apt.llvm.org packages).



Re: Clang optimiser vs preproc.c

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:

> clang15: ~16s
> clang16: ~211s
> clang17: ~233s

What are the numbers for gram.c?

            regards, tom lane



Re: Clang optimiser vs preproc.c

От
Thomas Munro
Дата:
On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
>
> > clang15: ~16s
> > clang16: ~211s
> > clang17: ~233s
>
> What are the numbers for gram.c?

clang15: ~3.8s
clang16: ~3.2s
clang17: ~2.9s



Re: Clang optimiser vs preproc.c

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thomas Munro <thomas.munro@gmail.com> writes:
>>> FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
>>> clang15: ~16s
>>> clang16: ~211s
>>> clang17: ~233s

>> What are the numbers for gram.c?

> clang15: ~3.8s
> clang16: ~3.2s
> clang17: ~2.9s

Huh.  There's not that many more productions in the ecpg grammar
than the core, so it doesn't seem likely that this is purely a
size-of-file issue.  I'd bet on there being something that clang
doesn't do well about the (very stylized) C code being generated
within the grammar productions.

We actually noticed this or a closely-related problem before [1]
and briefly discussed the possibility of rearranging the generated
code to make it less indigestible to clang.  But there was no concrete
idea about what to do specifically, and the thread slid off the radar
screen.

            regards, tom lane

[1] https://www.postgresql.org/message-id/20220902174033.wgt6li7oiaxmwftn@awork3.anarazel.de



Re: Clang optimiser vs preproc.c

От
Thomas Munro
Дата:
On Sat, Dec 16, 2023 at 4:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We actually noticed this or a closely-related problem before [1]
> and briefly discussed the possibility of rearranging the generated
> code to make it less indigestible to clang.  But there was no concrete
> idea about what to do specifically, and the thread slid off the radar
> screen.

I've never paid attention to the output of -ftime-report before but
this difference stands out pretty clearly with clang16:

   ---User Time---   --System Time--   --User+System--   ---Wall
Time---  --- Name ---
  201.2266 ( 99.6%)   0.0074 ( 99.3%)  201.2341 ( 99.6%)  207.1308 (
99.6%)  SLPVectorizerPass

The equivalent line for clang15 is:
   3.0979 ( 64.8%)   0.0000 (  0.0%)   3.0979 ( 64.8%)   3.0996 (
64.8%)  SLPVectorizerPass

The thing Andres showed in that other thread was like this (though in
my output it's grown "#2") which is much of the time in 15, but "only"
goes up by a couple of seconds in 16, so it's not our primary problem:

   9.1890 ( 73.1%)   0.0396 ( 23.9%)   9.2286 ( 72.4%)   9.6586 (
72.9%)  Greedy Register Allocator #2



Re: Clang optimiser vs preproc.c

От
Andres Freund
Дата:
Hi,

On 2023-12-15 22:19:56 -0500, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Thomas Munro <thomas.munro@gmail.com> writes:
> >>> FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
> >>> clang15: ~16s
> >>> clang16: ~211s
> >>> clang17: ~233s

Is this with non-assert clangs? Because I see times that seem smaller by more
than what can be explained by hardware differences:

preproc.c:
17   10.270s
16    9.685s
15    8.300s

gram.c:
17    1.936s
16    2.131s
15    2.161s

That's still bad, but a far cry away from 233s.


> Huh.  There's not that many more productions in the ecpg grammar
> than the core, so it doesn't seem likely that this is purely a
> size-of-file issue.  I'd bet on there being something that clang
> doesn't do well about the (very stylized) C code being generated
> within the grammar productions.
> 
> We actually noticed this or a closely-related problem before [1]
> and briefly discussed the possibility of rearranging the generated
> code to make it less indigestible to clang.  But there was no concrete
> idea about what to do specifically, and the thread slid off the radar
> screen.

One interest aspect might be that preproc.c ends up with ~33% more states than
gram.c

gram.c:
#define YYLAST   116587

preproc.c:
#define YYLAST   155495


Greetings,

Andres Freund



Re: Clang optimiser vs preproc.c

От
Thomas Munro
Дата:
On Sun, Dec 17, 2023 at 1:29 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-12-15 22:19:56 -0500, Tom Lane wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> Thomas Munro <thomas.munro@gmail.com> writes:
> > >>> FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
> > >>> clang15: ~16s
> > >>> clang16: ~211s
> > >>> clang17: ~233s
>
> Is this with non-assert clangs? Because I see times that seem smaller by more
> than what can be explained by hardware differences:
>
> preproc.c:
> 17   10.270s
> 16    9.685s
> 15    8.300s
>
> gram.c:
> 17    1.936s
> 16    2.131s
> 15    2.161s
>
> That's still bad, but a far cry away from 233s.

Hrmph.  Well something weird is going on, but it might indeed involve
me being confused about debug options of the compiler itself.  How can
one find out which build options were used for clang/llvm compiler +
libraries?  My earlier reports were from a little machine at home, so
let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running
Debian 12, again using packages from apt.llvm.org:

17 ~198s
16 ~14s
15 ~11s

OK so even if we ignore the wild outlier it is getting significantly
slower.  But... huh, there goes the big jump, but at a different
version than I saw with FBSD's packages.   Here's what perf says it's
doing:

+   99.42%    20.12%  clang-17  libLLVM-17.so.1      [.]
llvm::slpvectorizer::BoUpSLP::getTreeCost
                                                           ◆
+   96.91%     0.00%  clang-17  libLLVM-17.so.1      [.]
llvm::SLPVectorizerPass::runImpl
                                                           ▒
+   96.91%     0.00%  clang-17  libLLVM-17.so.1      [.]
llvm::SLPVectorizerPass::vectorizeChainsInBlock
                                                           ▒
+   96.91%     0.00%  clang-17  libLLVM-17.so.1      [.]
llvm::SLPVectorizerPass::vectorizeSimpleInstructions
                                                           ▒
+   96.91%     0.00%  clang-17  libLLVM-17.so.1      [.]
llvm::SLPVectorizerPass::vectorizeInsertElementInst
                                                           ▒
+   96.91%     0.00%  clang-17  libLLVM-17.so.1      [.]
llvm::SLPVectorizerPass::tryToVectorizeList
                                                           ▒
+   73.79%     0.00%  clang-17  libLLVM-17.so.1      [.]
0x00007fbead445cb0
                                                           ▒
+   36.88%    36.88%  clang-17  libLLVM-17.so.1      [.]
0x0000000001e45cda
                                                           ▒
+    3.95%     3.95%  clang-17  libLLVM-17.so.1      [.] 0x0000000001e45d11



Re: Clang optimiser vs preproc.c

От
Thomas Munro
Дата:
On Tue, Dec 19, 2023 at 11:42 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Hrmph.  Well something weird is going on, but it might indeed involve
> me being confused about debug options of the compiler itself.  How can
> one find out which build options were used for clang/llvm compiler +
> libraries?  My earlier reports were from a little machine at home, so
> let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running
> Debian 12, again using packages from apt.llvm.org:
>
> 17 ~198s
> 16 ~14s
> 15 ~11s

And on another Debian machine (this time a VM) also using apt.llvm.org
packages, the huge ~3 minute time occurs with clang-16... hrrrnnnff...
seems like there must be some other variable here that I haven't
spotted yet...



Re: Clang optimiser vs preproc.c

От
Alexander Lakhin
Дата:
Hello Thomas,

19.12.2023 07:20, Thomas Munro wrote:
> On Tue, Dec 19, 2023 at 11:42 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Hrmph.  Well something weird is going on, but it might indeed involve
>> me being confused about debug options of the compiler itself.  How can
>> one find out which build options were used for clang/llvm compiler +
>> libraries?  My earlier reports were from a little machine at home, so
>> let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running
>> Debian 12, again using packages from apt.llvm.org:
>>
>> 17 ~198s
>> 16 ~14s
>> 15 ~11s
> And on another Debian machine (this time a VM) also using apt.llvm.org
> packages, the huge ~3 minute time occurs with clang-16... hrrrnnnff...
> seems like there must be some other variable here that I haven't
> spotted yet...

Reproduced here, with clang-16 and clang-17 (on Ubuntu 22.04, on Fedora 39).
Namely, I tried on Ubuntu
clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04 and
clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04
from https://github.com/llvm/llvm-project/releases, as follows:
PATH=".../clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin:$PATH" CC=clang-17 CPPFLAGS="-O2" sh -c "./configure -q 
--enable-debug --enable-cassert && time make >make.log"
and then
PATH=".../clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin:$PATH" sh -c "cd src/interfaces/ecpg/preproc; time 
clang-17 -v -g -I../include -I../../../../src/interfaces/ecpg/include -I. -I../../../../src/interfaces/ecpg/ecpglib 
-I../../../../src/interfaces/libpq -I../../../../src/include -O2 -D_GNU_SOURCE -c -o preproc.o preproc.c"
144.59user 0.25system 2:24.94elapsed 99%CPU (0avgtext+0avgdata 217320maxresident)k

The same is observed with clang-16 (16.0.6 20231112100510+7cbf1a259152...)
installed from http://apt.llvm.org/jammy/.
(Adding parameters -fno-slp-vectorize or -mllvm -slp-threshold=100000 or
-mllvm -slp-max-vf=3 decreases/normalizes compilation time.)

On a fresh vagrant image "fedora/39-cloud-base", I tried versions
16.0.0~rc4, 17.0.0~rc3, 17.0.6 downloaded from
https://koji.fedoraproject.org/koji/packageinfo?packageID=21848
https://koji.fedoraproject.org/koji/packageinfo?packageID=5646
All of them also give 2+ minutes for me.
But I see no slowdown with version 15.0.7 on the same VM.

Also, I see no issue with clang-18, installed on Ubuntu from apt.llvm.org.
So, as far as I can see, this anomaly started from clang-16, and ended with
clang-18.
Comparing histories of SLPVectorizer.cpp:
https://github.com/llvm/llvm-project/commits/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
https://github.com/llvm/llvm-project/commits/release/17.x/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
I see a commit that probably could fix the issue in the master branch (for
clang-18):
[SLP][NFC]Improve compile time by storing all nodes for the given ...

Though I still can't explain how you get ~14s with clang-16. Could you show
the exact sequence of commands you use to measure the duration?

Best regards,
Alexander