Re: Lazy JIT IR code generation to increase JIT speed with partitions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Luc Vlaming <luc(at)swarm64(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lazy JIT IR code generation to increase JIT speed with partitions
Date: 2020-12-30 01:57:57
Message-ID: 20201230015757.2hlnyp5k2ww5hjyf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Great to see work in this area!

On 2020-12-28 09:44:26 +0100, Luc Vlaming wrote:
> I would like to propose a small patch to the JIT machinery which makes the
> IR code generation lazy. The reason for postponing the generation of the IR
> code is that with partitions we get an explosion in the number of JIT
> functions generated as many child tables are involved, each with their own
> JITted functions, especially when e.g. partition-aware joins/aggregates are
> enabled. However, only a fraction of those functions is actually executed
> because the Parallel Append node distributes the workers among the nodes.
> With the attached patch we get a lazy generation which makes that this is no
> longer a problem.

I unfortunately don't think this is quite good enough, because it'll
lead to emitting all functions separately, which can also lead to very
substantial increases of the required time (as emitting code is an
expensive step). Obviously that is only relevant in the cases where the
generated functions actually end up being used - which isn't the case in
your example.

If you e.g. look at a query like
SELECT blub, count(*),sum(zap) FROM foo WHERE blarg = 3 GROUP BY blub;
on a table without indexes, you would end up with functions for

- WHERE clause (including deforming)
- projection (including deforming)
- grouping key
- aggregate transition
- aggregate result projection

with your patch each of these would be emitted separately, instead of
one go. Which IIRC increases the required time by a significant amount,
especially if inlining is done (where each separate code generation ends
up with copies of the inlined code).

As far as I can see you've basically falsified the second part of this
comment (which you moved):

> +
> + /*
> + * Don't immediately emit nor actually generate the function.
> + * instead do so the first time the expression is actually evaluated.
> + * That allows to emit a lot of functions together, avoiding a lot of
> + * repeated llvm and memory remapping overhead. It also helps with not
> + * compiling functions that will never be evaluated, as can be the case
> + * if e.g. a parallel append node is distributing workers between its
> + * child nodes.
> + */

> - /*
> - * Don't immediately emit function, instead do so the first time the
> - * expression is actually evaluated. That allows to emit a lot of
> - * functions together, avoiding a lot of repeated llvm and memory
> - * remapping overhead.
> - */

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-12-30 02:11:16 Re: [PATCH] LWLock self-deadlock detection
Previous Message Zhihong Yu 2020-12-29 23:53:46 Re: Parallel Inserts in CREATE TABLE AS