Quick Links

Re: Lazy JIT IR code generation to increase JIT speed with partitions

From:	David Geier <geidav(dot)pg(at)gmail(dot)com>
To:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Lazy JIT IR code generation to increase JIT speed with partitions
Date:	2023-01-27 09:02:32
Message-ID:	201ee9fe-25f4-acd8-bc0f-1984df3c2d07@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Thanks for taking a look!

On 12/1/22 22:12, Dmitry Dolgov wrote:
>
> First to summarize things a bit: from what I understand there are two
> suggestions on the table, one is about caching modules when doing
> inlining, the second is about actual lazy jitting. Are those to tightly
> coupled together, could they be split into two separate patches? It
> would make it a bit easier to review and test.
Yes.
>
> I was playing with the caching part of the patch (still have to think
> about the lazy jitting), which generally seems to be a good idea. From
> what I see the main concern here is a chance that IRMover will rename
> types, degrading performance of the generated code in long run. I have
> to admit, I'm not fully sure mentioned LLVM 13 fix [1] completely
> eliminates those concerns, somehow its description is formulated in not
> very committing way ("don't know if this patch fixes ..., but it does
> fix a few soundness issues that have crept in."). But I haven't found
> any crashes or false asserts coming from the LLVM side (using v13),
> running several rounds of regression tests with forced jitting, so a
> point to the fix.
>
> I would be curious to learn how Andres was troubleshooting type renaming
> issues? Using LLVM 13 from packages, jitting the same query twice and
> dumping the bitcode out showed some difference in types a-la
> "%struct.ExprState.142" vs "%struct.ExprState.430" (from what I
> understood, the whole thing is an identifier, including the number) with
> the patch, but the very same changes are happening on the main branch as
> well. Of course, I was inspecting bitcode only for certain queries, it
> doesn't exclude that some other examples actually feature type renaming.
>
> In general, it would be interesting to know how to do some sort of "smoke
> tests" for the generated code, e.g. in case if LLVM has fixed this
> particular issue, but they might reappear in the future?
+1, especially with my findings described below.
> I did few performance tests and got numbers similar to posted in the
> thread, inlining time being impressively reduced (~10 times) as well as
> (suspiciously) optimization time (~1.5 times). The approach was a bit
> different though, I've run the sequence of example queries from the
> thread using pgbench and checked jit counters from pgss.
>
> Few small notes:
>
> If caching of modules is safe from LLVM >= 13, I guess it should be
> wrapped into a corresponding condition on LLVM_VERSION_MAJOR, right?
>
> Why the assert about hasExternalLinkage was removed from the
> llvmjit_inline.cpp?
>
> For so much discussion about such a small change there is definitely not
> enough commentaries in the code about dangers of cloning modules.
>
> Also, maybe a stupid question, but how big this cache should be? From
> what I understand it will be cleared on llvm_shutdown, does it mean it
> can grow unbounded if e.g. a single session is firing all the time
> different queries at the db?
The cache doesn't contain the modules generated for a query but the
bitcode files with the to-be-inlined functions stored on disk. So the
cache would stop growing as soon as a connection process has loaded all
of them. Nevertheless, if a workload uses many connections and truly all
or at least most of the bitcode files that could be quite some memory.

Beyond that I made an unfortunate discovery: the inlining time goes down
by so much because no inlining is happening anymore :-/. This is because
I cloned the module only when passing it to the IRMover, not right after
taking it from the cache. However, after the module is taken from the
cache it's still modified. llvm_execute_inline_plan() executes

if (valueToImport->hasExternalLinkage()) {
valueToImport->setLinkage(LinkageTypes::AvailableExternallyLinkage); }

But when checking if a function is inlinable in function_inlinable() we
check

if (F.hasAvailableExternallyLinkage()) return false;

which makes the code bail out for any function to be inlined.

It's very curious as to why we didn't really see that when dumping the
bitcode. It seems like the bitcode is always different enough to not
spot that. So +1 on your point on having a smoke test in place to verify
things still work.

If I change the code to clone the module right after taking it from the
cache and before making the changes to it, the JIT time remains high and
seems even higher than when re-reading the file from disk. Profiling
showed that llvm::CloneModule() is just super slow, especially for big
bitcode files like numeric.bc. I haven't investigated why that is and if
we can do something about it. I also don't plan to do so for the moment
being.

For reference, I attached the patch which only contains the caching
code. It's the variant that clones early.

--
David Geier
(ServiceNow)

Attachment	Content-Type	Size
0001-Cache-modules-in-JIT-inlining.patch	text/x-patch	2.0 KB

In response to

Re: Lazy JIT IR code generation to increase JIT speed with partitions at 2022-12-01 21:12:25 from Dmitry Dolgov

Responses

Re: Lazy JIT IR code generation to increase JIT speed with partitions at 2023-01-27 15:18:41 from Dmitry Dolgov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2023-01-27 09:02:53	Re: New strategies for freezing, advancing relfrozenxid early
Previous Message	Andres Freund	2023-01-27 08:51:59	Re: New strategies for freezing, advancing relfrozenxid early