Re: JIT doing duplicative optimization?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: JIT doing duplicative optimization?
Date: 2021-11-14 23:46:34
Message-ID: 2046810.1636933594@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> You could probably generate some queries with lots and lots of expressions
> to characterize this better. If it is O(N^2), it should not be hard to
> drive the cost up to the point where the guilty bit of code would stand
> out in a perf trace.

I experimented with that, using a few different-size queries generated
like this:

print "explain analyze\n";
for (my $i = 1; $i < 100; $i++) {
print " select sum(f1+$i) from base union all\n";
}
print "select sum(f1+0) from base;\n";

on a table made like

create table base as select generate_series(1,10000000) f1;

What I got, after setting max_parallel_workers_per_gather = 0,
was

10 subqueries:

Planning Time: 0.260 ms
JIT:
Functions: 30
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 4.651 ms, Inlining 8.870 ms, Optimization 152.937 ms, Emis
sion 95.046 ms, Total 261.504 ms
Execution Time: 15258.249 ms

100 subqueries:

Planning Time: 2.231 ms
JIT:
Functions: 300
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 44.163 ms, Inlining 9.934 ms, Optimization 1448.971 ms, Em
ission 928.438 ms, Total 2431.506 ms
Execution Time: 154815.515 ms

1000 subqueries:

Planning Time: 29.480 ms
JIT:
Functions: 3000
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 444.479 ms, Inlining 25.688 ms, Optimization 14989.696 ms,
Emission 9891.993 ms, Total 25351.856 ms
Execution Time: 1522011.367 ms

So the overhead looks pretty linear, or even a shade sublinear for the
"inlining" bit, *as long as only one process is involved*. However,
I noted that if I didn't force that, the JIT overhead went up because
the planner wanted to use more workers and each worker has to do its own
compilations. So perhaps the apparent nonlinearity in your examples comes
from that?

BTW, I realized while working on this that I have little idea what the
"Functions:" count is. Nor does our documentation explain that (or any
other of these numbers), at least not anywhere I could find. That seems
like a pretty serious documentation fail. If these numbers aren't
important enough to explain in the docs, why are we printing them at all?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-11-15 01:01:37 Re: row filtering for logical replication
Previous Message Dagfinn Ilmari Mannsåker 2021-11-14 20:42:33 Re: Inconsistent error message for varchar(n)