Re: JIT compilation per plan node

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: JIT compilation per plan node
Date: 2024-03-14 21:13:23
Message-ID: 2b9129e0-378e-4326-910f-e8ed1ac5d224@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/14/24 20:14, Robert Haas wrote:
> On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> I certainly agree that the current JIT costing is quite crude, and we've
>> all seen cases where the decision turns out to not be great. And I think
>> the plan to make the decisions at the node level makes sense, so +1 to
>> that in general.
>
> Seems reasonable to me also.
>
>> And I think you're right that looking just at the node total cost may
>> not be sufficient - that we may need a better cost model, considering
>> how many times an expression is executed and so on. But I think we
>> should try to do this in smaller steps, meaningful on their own,
>> otherwise we won't move at all. The two threads linked by Melih are ~4y
>> old and *nothing* changed since then, AFAIK.
>>
>> I think it's reasonable to start by moving the decision to the node
>> level - it's where the JIT happens, anyway. It may not be perfect, but
>> it seems like a clear improvement. And if we then choose to improve the
>> "JIT cost model" to address some of the issues you pointed out, surely
>> that would need to happen at the node level too ...
>
> I'm not sure I understand whether you (Tomas) think that this patch is
> a good idea or a bad idea as it stands. I read the first of these two
> paragraphs to suggest that the patch hasn't really evolved much in the
> last few years, perhaps suggesting that if it wasn't good enough to
> commit back then, it still isn't now. But the second of these two
> paragraphs seems more supportive.
>

To clarify, I think the patch is a step in the right direction, and a
meaningful improvement. It may not be the perfect solution we imagine
(but who knows how far we are from that), but AFAIK moving these
decisions to the node level is something the ideal solution would need
to do too.

The reference to the 4y old patches was meant to support this patch as
an improvement - perhaps incomplete, but still an improvement. We keep
imagining "perfect solutions" and then end up doing nothing.

I recognize there's a risk we may never get to have the ideal solution
(e.g. because it requires information we don't possess). But I still
think moving the decision to the node level would allow us to do better
decisions compared to just doing it for the query as a whole.

> From my own point of view, I definitely agree with David's statement
> that what we really want to know is how many times each expression
> will be evaluated. If we had that information, or just an estimate, I
> think we could make much better decisions in this area. But we don't
> have that infrastructure now, and it doesn't seem easy to create, so
> it seems to me that what we have to decide now is whether applying a
> cost threshold on a per-plan-node basis will produce better or worse
> results than what making one decision for the whole plan. David's
> provided an example of where it does indeed work better back in
> https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com
> - but could there be enough cases where the opposite happens to make
> us think that the patch is overall a bad idea?
>

Right, this risk or regression is always there, and I'm sure it'd be
possible to construct such cases. But considering how crude the current
costing is, I'd be surprised if this ends up being a net negative.

Also, is the number of executions really the thing we're missing? Surely
we know the number of rows the node is dealing with, so we could use
this (yes, I realize there are issues, but we deal with that when
costing quals too). Isn't it much bigger issue that we have pretty much
no cost model for the actual JIT (compilation/optimization) depending on
how many expressions it deals with?

> I personally find that a bit unlikely, although not impossible. I see
> a couple of ways that using the per-node cost can distort things -- it
> seems like it will tend to heavily feature JIT for "interior" plan
> nodes because the cost of a plan node includes it's children -- and as
> was mentioned previously, it doesn't really care whether the node cost
> is high because of expression evaluation or something else. But
> neither of those things seem like they'd be bad enough to make this a
> bad way forward over all. For the patch to lose, it seems like we'd
> need a case where the overall plan cost would have been high enough to
> trigger JIT pre-patch, but most of the benefit would have come from
> relatively low-cost nodes that don't get JITted post-patch. The
> easiest way for that to happen is if the planner's estimates are off,
> but that's not really an argument against this patch as much as it is
> an argument that query planning is hard in general.
>
> A slightly subtler way the patch could lose is if the new threshold is
> harder to adjust than the old one. For example, imagine that you have
> a query that does a Cartesian join. That makes the cost of the input
> nodes rather small compared to the cost of the join node, and it also
> means that JITting the inner join child in particular is probably
> rather important. But if you set join_above_cost low enough to JIT
> that node post-patch, then maybe you'll also JIT a bunch of things
> that aren't on the inner side of a nested loop and which might
> therefore not really need JIT. Unless I'm missing something, this is a
> fairly realistic example of where this patch's approach to costing
> could turn out to be painful ... but it's not like the current system
> is pain-free either.
>
> I don't really know what's best here, but I'm mildly inclined to
> believe that the patch might be a change for the better. I have not
> reviewed the implementation and have no comment on whether it's good
> or bad from that point of view.
>

I think it would be good to construct a bunch of cases where we think
this approach would misbehave, and see if the current code handles that
any better and/or if there's a way to improve that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Maciek Sakrejda 2024-03-14 21:14:29 Re: Possibility to disable `ALTER SYSTEM`
Previous Message Nathan Bossart 2024-03-14 21:00:10 Re: cleanup patches for incremental backup