Re: Lazy JIT IR code generation to increase JIT speed with partitions

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lazy JIT IR code generation to increase JIT speed with partitions
Date: 2022-12-01 21:12:25
Message-ID: 20221201211225.pr75fqvpt5lf3l3r@ddolgov.remote.csb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, Jul 14, 2022 at 02:45:29PM +0200, David Geier wrote:
> On Mon, Jul 4, 2022 at 10:32 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2022-06-27 16:55:55 +0200, David Geier wrote:
> > > Indeed, the total JIT time increases the more modules are used. The
> > reason
> > > for this to happen is that the inlining pass loads and deserializes all
> > to
> > > be inlined modules (.bc files) from disk prior to inlining them via
> > > llvm::IRMover. There's already a cache for such modules in the code, but
> > it
> > > is currently unused. This is because llvm::IRMover takes the module to be
> > > inlined as std::unique_ptr<llvm::Module>. The by-value argument requires
> > > the source module to be moved, which means it cannot be reused
> > afterwards.
> > > The code is accounting for that by erasing the module from the cache
> > after
> > > inlining it, which in turns requires reloading the module next time a
> > > reference to it is encountered.
> > >
> > > Instead of each time loading and deserializing all to be inlined modules
> > > from disk, they can reside in the cache and instead be cloned via
> > > llvm::CloneModule() before they get inlined. Key to calling
> > > llvm::CloneModule() is fully deserializing the module upfront, instead of
> > > loading the module lazily. That is why I changed the call from
> > > LLVMGetBitcodeModuleInContext2() (which lazily loads the module via
> > > llvm::getOwningLazyBitcodeModule()) to LLVMParseBitCodeInContext2()
> > (which
> > > fully loads the module via llvm::parseBitcodeFile()). Beyond that it
> > seems
> > > like that prior to LLVM 13, cloning modules could fail with an assertion
> > > (not sure though if that would cause problems in a release build without
> > > assertions). Andres reported this problem back in the days here [1]. In
> > the
> > > meanwhile the issue got discussed in [2] and finally fixed for LLVM 13,
> > see
> > > [3].
> >
> > Unfortunately that doesn't work right now - that's where I had started. The
> > problem is that IRMover renames types. Which, in the case of cloned modules
> > unfortunately means that types used cloned modules are also renamed in the
> > "origin" module. Which then causes problems down the line, because parts of
> > the LLVM code match types by type names.
> >
> > That can then have the effect of drastically decreasing code generation
> > quality over time, because e.g. inlining never manages to find signatures
> > compatible.

Hi,

Thanks for the patch, looks quite interesting!

First to summarize things a bit: from what I understand there are two
suggestions on the table, one is about caching modules when doing
inlining, the second is about actual lazy jitting. Are those to tightly
coupled together, could they be split into two separate patches? It
would make it a bit easier to review and test.

I was playing with the caching part of the patch (still have to think
about the lazy jitting), which generally seems to be a good idea. From
what I see the main concern here is a chance that IRMover will rename
types, degrading performance of the generated code in long run. I have
to admit, I'm not fully sure mentioned LLVM 13 fix [1] completely
eliminates those concerns, somehow its description is formulated in not
very committing way ("don't know if this patch fixes ..., but it does
fix a few soundness issues that have crept in."). But I haven't found
any crashes or false asserts coming from the LLVM side (using v13),
running several rounds of regression tests with forced jitting, so a
point to the fix.

I would be curious to learn how Andres was troubleshooting type renaming
issues? Using LLVM 13 from packages, jitting the same query twice and
dumping the bitcode out showed some difference in types a-la
"%struct.ExprState.142" vs "%struct.ExprState.430" (from what I
understood, the whole thing is an identifier, including the number) with
the patch, but the very same changes are happening on the main branch as
well. Of course, I was inspecting bitcode only for certain queries, it
doesn't exclude that some other examples actually feature type renaming.

In general, it would be interesting to know how to do some sort of "smoke
tests" for the generated code, e.g. in case if LLVM has fixed this
particular issue, but they might reappear in the future?

I did few performance tests and got numbers similar to posted in the
thread, inlining time being impressively reduced (~10 times) as well as
(suspiciously) optimization time (~1.5 times). The approach was a bit
different though, I've run the sequence of example queries from the
thread using pgbench and checked jit counters from pgss.

Few small notes:

If caching of modules is safe from LLVM >= 13, I guess it should be
wrapped into a corresponding condition on LLVM_VERSION_MAJOR, right?

Why the assert about hasExternalLinkage was removed from the
llvmjit_inline.cpp?

For so much discussion about such a small change there is definitely not
enough commentaries in the code about dangers of cloning modules.

Also, maybe a stupid question, but how big this cache should be? From
what I understand it will be cleared on llvm_shutdown, does it mean it
can grow unbounded if e.g. a single session is firing all the time
different queries at the db?

[1]: https://reviews.llvm.org/rG22a52dfddcefad4f275eb8ad1cc0e200074c2d8a

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-12-01 21:15:17 Re: Error-safe user functions
Previous Message Tom Lane 2022-12-01 20:49:42 Re: Error-safe user functions