Re: Add parameter jit_warn_above_fraction

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add parameter jit_warn_above_fraction
Date: 2022-04-08 13:39:18
Message-ID: 20220408133918.GE10577@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Magnus Hagander (magnus(at)hagander(dot)net) wrote:
> On Fri, Apr 8, 2022 at 2:19 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > On Fri, 8 Apr 2022 at 23:27, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> > > On Wed, Mar 30, 2022 at 3:04 PM Magnus Hagander <magnus(at)hagander(dot)net>
> > wrote:
> > >>
> > >> On Tue, Mar 29, 2022 at 10:06 PM David Rowley <dgrowleyml(at)gmail(dot)com>
> > wrote:
> > >>>
> > >>> If we go with this patch, the problem I see here is that the amount
> > >>> of work the JIT compiler must do for a given query depends mostly on
> > >>> the number of expressions that must be compiled in the query (also to
> > >>> a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
> > >>> jit_tuple_deforming and jit_expressions). The DBA does not really have
> > >>> much control over the number of expressions in the query. All he or
> > >>> she can do to get rid of the warning is something like increase
> > >>> jit_above_cost. After a few iterations of that, the end result is
> > >>> that jit_above_cost is now high enough that JIT no longer triggers
> > >>> for, say, that query to that table with 1000 partitions where no
> > >>> plan-time pruning takes place. Is that really a good thing? It likely
> > >>> means that we just rarely JIT anything at all!
> > >>
> > >>
> > >> I don't agree with the conclusion of that.
> > >>
> > >> What the parameter would be useful for is to be able to tune those
> > costs (or just turn it off) *for that individual query*. That doesn't mean
> > you "rarely JIT anything atll", it just means you rarely JIT that
> > particular query.
> >
> > I just struggle to imagine that anyone is going to spend much effort
> > tuning a warning parameter per query. I imagine they're far more
> > likely to just ramp it up to only catch some high percentile problems
> > or just (more likely) just not bother with it. It seems more likely
> > that if anyone was to tune anything per query here it would be
> > jit_above_cost, since that actually might have an affect on the
> > performance of the query, rather than if it spits out some warning
> > message or not. ISTM that if the user knows what to set it to per
> > query, then there's very little point in having a warning as we'd be
> > alerting them to something they already know about.
>
> I would not expect people to tune the *warning* at a query level. If
> anything, then ys, they would tune the either jit_above_cost or just
> jit=off. But the idea being you can do that on a per query level instead of
> globally.

Yeah, exactly, this is about having a busy system and wanting to know
which queries are spending a lot of time doing JIT relative to the query
time, so that you can go adjust your JIT parameters or possibly disable
JIT for those queries (or maybe bring those cases to -hackers and try to
help make our costing better).

> > I looked in the -general list to see if we could get some common
> > explanations to give us an idea of the most common reason for high JIT
> > compilation time. It seems that the plans were never simple. [1] seems
> > due to a complex plan. I'm basing that off the "Functions: 167". I
> > didn't catch the full plan. From what I can tell, [2] seems to be due
> > to "lots of empty tables", so assuming the clamping at 1 page is
> > causing issues there. I think both of those cases could be resolved
> > by building the costing the way I mentioned. I admit that 2 cases is
> > not a very large sample size.
>
> Again, I am very much for improvements of the costing model. This is in no
> way intended to be a replacement for that. It's intended to be a stop-gap.

Not sure I'd say it's a 'stop-gap' as it's really very similar, imv
anyway, to log_min_duration_statement- you want to know what queries are
taking a lot of time but you can't log all of them.

> What I see much of today are things like
> https://dba.stackexchange.com/questions/264955/handling-performance-problems-with-jit-in-postgres-12
> or
> https://dev.to/xenatisch/cascade-of-doom-jit-and-how-a-postgres-update-led-to-70-failure-on-a-critical-national-service-3f2a
>
> The bottom line is that people end up with recommendations to turn off JIT
> globally more or less by default. Because there's no real useful way today
> to figure out when it causes problems vs when it helps.

Yeah, that's frustrating.

> The addition to pg_stat_statements I pushed a short while ago would help
> with that. But I think having a warning like this would also be useful. As
> a stop-gap measure, yes, but we really don't know when we will have an
> improved costing model for it. I hope you're right and that we can have it
> by 16, and then I will definitely advocate for removing the warning again
> if it works.

Having this in pg_stat_statements is certainly helpful but having a
warning also is. I don't think we have to address this in only one way.
A lot faster to flip this guc and then look in the logs on a busy system
than to install pg_stat_statements, restart the cluster once you get
permission to do so, and then query it.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-04-08 13:44:27 Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?
Previous Message Robert Haas 2022-04-08 13:36:27 Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery