Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.

From: Alex Fan <alex(dot)fan(dot)q(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "geidav(dot)pg(at)gmail(dot)com" <geidav(dot)pg(at)gmail(dot)com>, "luc(at)swarm64(dot)com" <luc(at)swarm64(dot)com>
Subject: Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.
Date: 2023-01-05 23:19:55
Message-ID: CAP8f6fCT3ao0c_3UU3N=vUpXx-WB54eLbSz41wroxHqaqgm5Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> why should we do this only for riscv?
I originally considered riscv because I only have amd64 and riscv hardware
for testing. But afaik, JITLink currently supports arm64 and x86-64 (ELF &
MacOS), riscv (both 64bit and 32bit, ELF). i386 seems also supported. No
support for ppc or mips exist yet. I am most familiar with riscv and its
support has been quite stable. I also know Julia folks have been using
orcjit only for arm64 on MacOS for quite a while, but stayed in mcjit for
other platforms. clasp <https://github.com/clasp-developers/clasp> also
uses orcjit on x86 (ELF & macos) heavily. I'd say It should be safe to
switch on x86 and arm64 also.

> that is necessary for RISCV because they haven't got around to doing this
yet
I doubt if the runtimedylib patch will eventually be accepted since mcjit
with its runtimedylib is in maintenance mode. Users are generally suggested
to switch to orcjit.

I am not familiar with Windows either. I realise from your link,
RuntimeDyld does have windows COFF support. I have clearly misread
something from discord.
There is a recent talk <https://www.youtube.com/watch?v=_5_gm58sQIg> about
jitlink windows support that covers a lot of details. I think the situation
is that RuntimeDyld support for COFF is not stable and only limited to
large code model (RuntimeDyld's constraint), so people tend to use ELF
format on windows with RuntimeDyld. JITLINK's recent COFF support is more
stable and allows small code model.

The logic of abi and extensions is indeed confusing and llvm backend also
has some nuances. I will try to explain it to my best and am very happy to
clarify more based on my knowledge if there are more questions.

'm' 'i' 'a' 'f' 'd' are all extension features and usually are grouped
together as 'g' for general. Along with 'c' for compression (and two tiny
extension sets Zicsr and Zifencei splitted from 'i' since isa spec
20191213, llvm will still default enable them along with 'i', so not
relevant to us) is named rv64gc for 64 bit machine. rv64gc is generally the
recommended basic set for linux capable 64bit machines. Some embedded cores
without +f +d still work with Linux, but are rare and unlikely want
postgresql.
abi_name like 'lp64' 'lp64d' is considered independent from cpu extensions.
You can have a "+f +d" machines with lp64 abi, meaning the function body
can have +f +d instructions and registers, but the function signature
cannot. To set abi explicitly for llvm backend, we need to pass
MCOptions.ABIName or a module metadata target-abi.

If abi is missing, before llvm-15, the backend defaults to lp64 on 64bit
platform, ignoring any float extension enabled or not. The consensus seemed
to be backend should be explicitly configured. After llvm-15, specifically this
commit <https://reviews.llvm.org/D118333>, it chooses lp64d if +d is
present to align with clang default. I mostly test on llvm-14 because
JITLINK riscv support is complete already except some minor fixes, and
notice until now the newly change. But because I test on Gentoo, it has abi
lp64 build on riscv64gc, so if abi is not configured this way, I would end
up with lp64d enabled by +d extension from riscv64`g`c on a lp64 build.

> your patch seems to imply that LLVM is not able to get features reliably
> for RISCV -- why not, immaturity or technical reason why it can't?
Immaturity. Actually, unimplemented for riscv as you can check here
<https://github.com/llvm/llvm-project/blob/b5edd522d195447e3ae16f95c5821762edbf815a/llvm/lib/Support/Host.cpp#L1836>.
Because gethostcpuname usually returns generic or generic-rv64, feature
list for these is basically empty except 'i'. I may work out a patch for
llvm later.

> wouldn't the subtarget/ABI be the same as the one that the LLVM library
itself was compiled for
llvm is inherently a multitarget & cross platform compiler backend. It is
capable of all subtargets & features for enabled platform(s). The target
triple works like you said. There is LLVM_DEFAULT_TARGET_TRIPLE that sets
the default to native if no target is specified in runtime, so default
triple is reliable. But cpuname, abi, extensions don't follow this logic.
The llvm riscv backend devs expect these to be configured explicitly
(although there are some default and dependencies, and frontend like clang
also have default). Therefore gethostcpuname is needed and feature
extensions are derived from known cpuname. In case cpuname from
gethostcpuname is not enough, gethostcpufeatures is needed like your
example of AVX extension.

> why we don't need to deal with this kind of manual subtarget selection on
any other architecture
ppc sets default abi here
<https://github.com/llvm/llvm-project/blob/04a23cb2191f865c51fce087b9f3083ac17ae10e/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp#L235>,
so there is no abi issue. Big end or little end is encoded in target triple
like ppc64 (big endian), ppc64le (little endian), and a recent riscv64be
patch <https://reviews.llvm.org/D128612>. I guess that is why there are no
endian issues.
------------------------------
*From:* Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
*Sent:* Thursday, December 15, 2022 9:59:39 AM
*To:* David Rowley <dgrowleyml(at)gmail(dot)com>
*Cc:* Alex Fan <alex(dot)fan(dot)q(at)gmail(dot)com>; pgsql-hackers(at)postgresql(dot)org <
pgsql-hackers(at)postgresql(dot)org>; andres(at)anarazel(dot)de <andres(at)anarazel(dot)de>;
geidav(dot)pg(at)gmail(dot)com <geidav(dot)pg(at)gmail(dot)com>; luc(at)swarm64(dot)com <luc(at)swarm64(dot)com>
*Subject:* Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit
linker of old Rtdyld.

On Thu, Nov 24, 2022 at 12:08 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Wed, 23 Nov 2022 at 23:13, Alex Fan <alex(dot)fan(dot)q(at)gmail(dot)com> wrote:
> > I am new to the postgres community and apologise for resending this as
the previous one didn't include patch properly and didn't cc reviewers
(maybe the reason it has been buried in mailing list for months)
>
> Welcome to the community!

+1

I don't know enough about LLVM or RISCV to have any strong opinions
here, but I have a couple of questions... It looks like we have two
different things in this patch:

1. Optionally use JITLink instead of RuntimeDyld for relocation.
From what I can tell from some quick googling, that is necessary for
RISCV because they haven't got around to doing this yet:

https://reviews.llvm.org/D127842

Independently of that, it seems that
https://llvm.org/docs/JITLink.html is the future and RuntimeDyld will
eventually be obsolete, so one question I have is: why should we do
this only for riscv?

You mentioned that this change might be necessary to support COFF and
thus Windows. I'm not a Windows user and I think it would be beyond
my pain threshold to try to get this working there by using CI alone,
but I'm just curious... wouldn't
https://github.com/llvm/llvm-project/blob/main/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldCOFF.cpp
work for that already? (I haven't heard about anyone successfully
using PostgreSQL/LLVM on Windows; it would certainly be cool to hear
some more about what would be needed for that.)

2. Manually adjust the CPU features and ABI/subtarget.

+#if defined(__riscv)
+ /* getHostCPUName returns "generic-rv[32|64]", which lacks all
features */
+ Features.AddFeature("m", true);
+ Features.AddFeature("a", true);
+ Features.AddFeature("c", true);
+# if defined(__riscv_float_abi_single)
+ Features.AddFeature("f", true);
+# endif
+# if defined(__riscv_float_abi_double)
+ Features.AddFeature("d", true);
+# endif
+#endif

I'm trying to understand this, and the ABI name selection logic.
Maybe there are two categories of features here?

The ABI bits, "f" and "d" are not just "which instructions can I
used", but they also affect the ABI (I guess something like: where
floats go in the calling convention), and they have to match the ABI
of the main executable to allow linking to succeed, right? Probably a
stupid question: wouldn't the subtarget/ABI be the same as the one
that the LLVM library itself was compiled for (which must also match
the postgres executable), and doesn't it know that somewhere? I guess
I'm confused about why we don't need to deal with this kind of manual
subtarget selection on any other architecture: for PPC it
automatically knows whether to be big endian/little endian, 32 or 64
bit, etc.

Then for "m", "a", "c", I guess these are code generation options -- I
think "c" is compressed instructions for example? Can we get a
comment to say what they are? Why do you think that all RISCV chips
have these features? Perhaps these are features that are part of some
kind of server chip profile (ie features not present in a tiny
microcontroller chip found in a toaster, but expected in any system
that would actually run PostgreSQL) -- in which case can we get a
reference to explain that?

I remembered the specific reason why we have that
LLVMGethostCPUFeatures() call: it's because the list of default
features that would apply otherwise based on CPU "name" alone turned
out to assume that all x86 chips had AVX, but some low end parts
don't, so we have to check for AVX etc presence that way. But your
patch seems to imply that LLVM is not able to get features reliably
for RISCV -- why not, immaturity or technical reason why it can't?

+ assert(ES && "ES must not be null");

We use our own Assert() macro (capital A).

Attachment Content-Type Size
add_jitlink_and_riscv_jitting.patch text/x-patch 7.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-05 23:21:40 Re: Generate pg_stat_get_xact*() functions with Macros
Previous Message Tom Lane 2023-01-05 23:02:31 Re: Logical replication - schema change not invalidating the relation cache