Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.

Поиск
Список
Период
Сортировка
От Alex Fan
Тема Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.
Дата
Msg-id CAP8f6fCT3ao0c_3UU3N=vUpXx-WB54eLbSz41wroxHqaqgm5Cw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
> why should we do this only for riscv?
I originally considered riscv because I only have amd64 and riscv hardware for testing. But afaik, JITLink currently supports arm64 and x86-64 (ELF & MacOS), riscv (both 64bit and 32bit, ELF). i386 seems also supported. No support for ppc or mips exist yet. I am most familiar with riscv and its support has been quite stable. I also know Julia folks have been using orcjit only for arm64 on MacOS for quite a while, but stayed in mcjit for other platforms. clasp also uses orcjit on x86 (ELF & macos) heavily. I'd say It should be safe to switch on x86 and arm64 also.

> that is necessary for RISCV because they haven't got around to doing this yet
I doubt if the runtimedylib patch will eventually be accepted since mcjit with its runtimedylib is in maintenance mode. Users are generally suggested to switch to orcjit.

I am not familiar with Windows either. I realise from your link, RuntimeDyld does have windows COFF support. I have clearly misread something from discord.
There is a recent talk about jitlink windows support that covers a lot of details. I think the situation is that RuntimeDyld support for COFF is not stable and only limited to large code model (RuntimeDyld's constraint), so people tend to use ELF format on windows with RuntimeDyld. JITLINK's recent COFF support is more stable and allows small code model.

The logic of abi and extensions is indeed confusing and llvm backend also has some nuances. I will try to explain it to my best and am very happy to clarify more based on my knowledge if there are more questions.

 'm' 'i' 'a' 'f' 'd' are all extension features and usually are grouped together as 'g' for general. Along with 'c' for compression (and two tiny extension sets Zicsr and Zifencei splitted from 'i' since isa spec 20191213, llvm will still default enable them along with 'i', so not relevant to us) is named rv64gc for 64 bit machine. rv64gc is generally the recommended basic set for linux capable 64bit machines. Some embedded cores without +f +d still work with Linux, but are rare and unlikely want postgresql.
abi_name like 'lp64' 'lp64d' is considered independent from cpu extensions. You can have a "+f +d" machines with lp64 abi, meaning the function body can have +f +d instructions and registers, but the function signature cannot. To set abi explicitly for llvm backend, we need to pass MCOptions.ABIName or a module metadata target-abi.

If abi is missing, before llvm-15, the backend defaults to lp64 on 64bit platform, ignoring any float extension enabled or not. The consensus seemed to be backend should be explicitly configured. After llvm-15, specifically this commit, it chooses lp64d if +d is present to align with clang default. I mostly test on llvm-14 because JITLINK riscv support is complete already except some minor fixes, and notice until now the newly change. But because I test on Gentoo, it has abi lp64 build on riscv64gc, so if abi is not configured this way, I would end up with lp64d enabled by +d extension from riscv64`g`c on a lp64 build.

> your patch seems to imply that LLVM is not able to get features reliably
> for RISCV -- why not, immaturity or technical reason why it can't?
Immaturity. Actually, unimplemented for riscv as you can check here. Because gethostcpuname usually returns generic or generic-rv64, feature list for these is basically empty except 'i'. I may work out a patch for llvm later.

> wouldn't the subtarget/ABI be the same as the one that the LLVM library itself was compiled for
llvm is inherently a multitarget & cross platform compiler backend. It is capable of all subtargets & features for enabled platform(s). The target triple works like you said. There is LLVM_DEFAULT_TARGET_TRIPLE that sets the default to native if no target is specified in runtime, so default triple is reliable. But cpuname, abi, extensions don't follow this logic. The llvm riscv backend devs expect these to be configured explicitly (although there are some default and dependencies, and frontend like clang also have default). Therefore gethostcpuname is needed and feature extensions are derived from known cpuname. In case cpuname from gethostcpuname is not enough, gethostcpufeatures is needed like your example of AVX extension.

> why we don't need to deal with this kind of manual subtarget selection on any other architecture
ppc sets default abi here, so there is no abi issue. Big end or little end is encoded in target triple like ppc64 (big endian), ppc64le (little endian), and a recent riscv64be patch. I guess that is why there are no endian issues.

From: Thomas Munro <thomas.munro@gmail.com>
Sent: Thursday, December 15, 2022 9:59:39 AM
To: David Rowley <dgrowleyml@gmail.com>
Cc: Alex Fan <alex.fan.q@gmail.com>; pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>; andres@anarazel.de <andres@anarazel.de>; geidav.pg@gmail.com <geidav.pg@gmail.com>; luc@swarm64.com <luc@swarm64.com>
Subject: Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.
 
On Thu, Nov 24, 2022 at 12:08 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 23 Nov 2022 at 23:13, Alex Fan <alex.fan.q@gmail.com> wrote:
> > I am new to the postgres community and apologise for resending this as the previous one didn't include patch properly and didn't cc reviewers (maybe the reason it has been buried in mailing list for months)
>
> Welcome to the community!

+1

I don't know enough about LLVM or RISCV to have any strong opinions
here, but I have a couple of questions...  It looks like we have two
different things in this patch:

1.  Optionally use JITLink instead of RuntimeDyld for relocation.
From what I can tell from some quick googling, that is necessary for
RISCV because they haven't got around to doing this yet:

https://reviews.llvm.org/D127842

Independently of that, it seems that
https://llvm.org/docs/JITLink.html is the future and RuntimeDyld will
eventually be obsolete, so one question I have is: why should we do
this only for riscv?

You mentioned that this change might be necessary to support COFF and
thus Windows.  I'm not a Windows user and I think it would be beyond
my pain threshold to try to get this working there by using CI alone,
but I'm just curious... wouldn't
https://github.com/llvm/llvm-project/blob/main/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldCOFF.cpp
work for that already?  (I haven't heard about anyone successfully
using PostgreSQL/LLVM on Windows; it would certainly be cool to hear
some more about what would be needed for that.)

2.  Manually adjust the CPU features and ABI/subtarget.

+#if defined(__riscv)
+    /* getHostCPUName returns "generic-rv[32|64]", which lacks all features */
+    Features.AddFeature("m", true);
+    Features.AddFeature("a", true);
+    Features.AddFeature("c", true);
+# if defined(__riscv_float_abi_single)
+    Features.AddFeature("f", true);
+# endif
+# if defined(__riscv_float_abi_double)
+    Features.AddFeature("d", true);
+# endif
+#endif

I'm trying to understand this, and the ABI name selection logic.
Maybe there are two categories of features here?

The ABI bits, "f" and "d" are not just "which instructions can I
used", but they also affect the ABI (I guess something like: where
floats go in the calling convention), and they have to match the ABI
of the main executable to allow linking to succeed, right?  Probably a
stupid question: wouldn't the subtarget/ABI be the same as the one
that the LLVM library itself was compiled for (which must also match
the postgres executable), and doesn't it know that somewhere?  I guess
I'm confused about why we don't need to deal with this kind of manual
subtarget selection on any other architecture: for PPC it
automatically knows whether to be big endian/little endian, 32 or 64
bit, etc.

Then for "m", "a", "c", I guess these are code generation options -- I
think "c" is compressed instructions for example?  Can we get a
comment to say what they are?  Why do you think that all RISCV chips
have these features?  Perhaps these are features that are part of some
kind of server chip profile (ie features not present in a tiny
microcontroller chip found in a toaster, but expected in any system
that would actually run PostgreSQL) -- in which case can we get a
reference to explain that?

I remembered the specific reason why we have that
LLVMGethostCPUFeatures() call: it's because the list of default
features that would apply otherwise based on CPU "name" alone turned
out to assume that all x86 chips had AVX, but some low end parts
don't, so we have to check for AVX etc presence that way.  But your
patch seems to imply that LLVM is not able to get features reliably
for RISCV -- why not, immaturity or technical reason why it can't?

+    assert(ES && "ES must not be null");

We use our own Assert() macro (capital A).
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Logical replication - schema change not invalidating the relation cache
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Generate pg_stat_get_xact*() functions with Macros