Quick Links

Re: Lazy JIT IR code generation to increase JIT speed with partitions

From:	David Geier <geidav(dot)pg(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Lazy JIT IR code generation to increase JIT speed with partitions
Date:	2022-07-14 12:45:29
Message-ID:	CAPsAnrkpCKNeU17T6Ez98m2or8+nVS3ZmC+HxsbCKS0pVycVAg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jul 4, 2022 at 10:32 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2022-06-27 16:55:55 +0200, David Geier wrote:
> > Indeed, the total JIT time increases the more modules are used. The
> reason
> > for this to happen is that the inlining pass loads and deserializes all
> to
> > be inlined modules (.bc files) from disk prior to inlining them via
> > llvm::IRMover. There's already a cache for such modules in the code, but
> it
> > is currently unused. This is because llvm::IRMover takes the module to be
> > inlined as std::unique_ptr<llvm::Module>. The by-value argument requires
> > the source module to be moved, which means it cannot be reused
> afterwards.
> > The code is accounting for that by erasing the module from the cache
> after
> > inlining it, which in turns requires reloading the module next time a
> > reference to it is encountered.
> >
> > Instead of each time loading and deserializing all to be inlined modules
> > from disk, they can reside in the cache and instead be cloned via
> > llvm::CloneModule() before they get inlined. Key to calling
> > llvm::CloneModule() is fully deserializing the module upfront, instead of
> > loading the module lazily. That is why I changed the call from
> > LLVMGetBitcodeModuleInContext2() (which lazily loads the module via
> > llvm::getOwningLazyBitcodeModule()) to LLVMParseBitCodeInContext2()
> (which
> > fully loads the module via llvm::parseBitcodeFile()). Beyond that it
> seems
> > like that prior to LLVM 13, cloning modules could fail with an assertion
> > (not sure though if that would cause problems in a release build without
> > assertions). Andres reported this problem back in the days here [1]. In
> the
> > meanwhile the issue got discussed in [2] and finally fixed for LLVM 13,
> see
> > [3].
>
> Unfortunately that doesn't work right now - that's where I had started. The
> problem is that IRMover renames types. Which, in the case of cloned modules
> unfortunately means that types used cloned modules are also renamed in the
> "origin" module. Which then causes problems down the line, because parts of
> the LLVM code match types by type names.
>
> That can then have the effect of drastically decreasing code generation
> quality over time, because e.g. inlining never manages to find signatures
> compatible.
>
> Looking at the LLVM patches these issues seem to be gone.
The same suggests dumping the bitcode and running all tests with JIT
force-enabled.
See below.

>
> > However, curiously the time spent on optimizing is also reduced (95ms
> > instead of 164ms). Could this be because some of the applied
> optimizations
> > are ending up in the cached module?
>
> I suspect it's more that optimization stops being able to do a lot, due to
> the
> type renamign issue.
>
>
> > @Andres: could you provide me with the queries that caused the assertion
> > failure in LLVM?
>
> I don't think I have the concrete query. What I tend to do is to run the
> whole
> regression tests with forced JITing. I'm fairly certain this triggered the
> bug
> at the time.
>
>
> > Have you ever observed a segfault with a non-assert-enabled build?
>
> I think I observed bogus code generation that then could lead to segfaults
> or
> such.
>
OK, then using a non-assert-enabled LLVM build seems good enough.
Especially, given the fixes they merged in [3].

>
>
> > I just want to make sure this is truly fixed in LLVM 13. Running 'make
> > check-world' all tests passed.
>
> With jit-ing forced for everything?
>
> One more thing to try is to jit-compile twice and ensure the code is the
> same. It certainly wasn't in the past due to the above issue.
>
> Yes, with jitting force-enabled everything passes (I removed all checks
for the various 'jit_above' costs from planner.c).

I also dumped the bitcode via 'SET jit_dump_bitcode = ON' and compared the
bitcode output after inlining and after optimization for two successive
executions of the very same query. The .bc files after inlining only
differ in two bytes. The .bc files after optimization differ in a lot more
bytes. However, the same is true for the master branch. How do you exactly
conclude the identity of compilation output?

--
David Geier
(ServiceNow)

In response to

Re: Lazy JIT IR code generation to increase JIT speed with partitions at 2022-07-04 20:32:22 from Andres Freund

Responses

Re: Lazy JIT IR code generation to increase JIT speed with partitions at 2022-12-01 21:12:25 from Dmitry Dolgov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2022-07-14 13:02:29	Re: EINTR in ftruncate()
Previous Message	Thomas Munro	2022-07-14 12:15:22	Re: EINTR in ftruncate()