Re: Lazy JIT IR code generation to increase JIT speed with partitions

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Luc Vlaming Hummel <luc(dot)vlaming(at)servicenow(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lazy JIT IR code generation to increase JIT speed with partitions
Date: 2022-07-18 09:00:08
Message-ID: 254288e2-159c-dd85-b2ce-f9d331663e43@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Can you elaborate a bit more on how you conclude that?

Looking at the numbers I measured in one of my previous e-mails, it
looks to me like the overhead of using multiple modules is fairly low
and only measurable in queries with dozens of modules. Given that JIT is
most useful in queries that process a fair amount of rows, having to
spend marginally more time on creating the JIT program while being able
to use JIT much more fine grained seems desirable. For example, the time
you lose for handling more modules, you save right away because not the
whole plan gets JIT compiled.

It is a trade-off between optimizing for the best case where everything
in the plan can truly benefit from jitting and hence a single module
that has it all is best, vs the worst-case where almost nothing truly
profits from jitting and hence only a small fraction of the plan should
actually be jitted. The penalty for the best case seems low though,
because (1) the overhead is low in absolute terms, and (2) also if the
entire plan truly benefits from jitting, spending sub-ms more per node
seems neglectable because there is anyways going to be significant time
spent.

--
David Geier
(ServiceNow)

On 7/4/22 22:23, Andres Freund wrote:
> Hi,
>
> On 2022-07-04 06:43:00 +0000, Luc Vlaming Hummel wrote:
>> Thanks for reviewing this and the interesting examples!
>>
>> Wanted to give a bit of extra insight as to why I'd love to have a system that can lazily emit JIT code and hence creates roughly a module per function:
>> In the end I'm hoping that we can migrate to a system where we only JIT after a configurable cost has been exceeded for this node, as well as a configurable amount of rows has actually been processed.
>> Reason is that this would safeguard against some problematic planning issues
>> wrt JIT (node not being executed, row count being massively off).
> I still don't see how it's viable to move to always doing function-by-function
> emission overhead wise.
>
> I also want to go to do JIT in the background and triggered by acutal
> usage. But to me it seems a dead end to require moving to
> one-function-per-module model for that.
>
>
>> If this means we have to invest more in making it cheap(er) to emit modules,
>> I'm all for that.
> I think that's just inherently more expensive and thus a no-go.
>
>
>> @Andres if there's any other things we ought to fix to make this cheap
>> (enough) compared to the previous code I'd love to know your thoughts.
> I'm not seeing it.
>
> Greetings,
>
> Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2022-07-18 09:05:50 Re: [Commitfest 2022-07] Begins Now
Previous Message Amit Kapila 2022-07-18 08:51:32 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher