Re: plan cache overhead on plpgsql expression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plan cache overhead on plpgsql expression
Date: 2020-03-21 18:24:05
Message-ID: 1478.1584815045@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> So the patch has a problem with constant casting - unfortunately the mix of
> double precision variables and numeric constants is pretty often in
> Postgres.

Yeah. I believe the cause of that is that the patch thinks it can skip
passing an inline-function-free simple expression through the planner.
That's flat out wrong. Quite aside from failing to perform
constant-folding (which is presumably the cause of the slowdown you
spotted), that means that we miss performing such non-optional
transformations as rearranging named-function-argument notation into
positional order. I didn't bother to test that but I'm sure it can be
shown to lead to crashes.

Now that I've looked at the patch I don't like it one bit from a
structural standpoint either. It's basically trying to make an end
run around the plancache, which is not going to be maintainable even
if it correctly accounted for everything the plancache does today.
Which it doesn't. Two big problems are:

* It doesn't account for the possibility of search_path changes
affecting the interpretation of an expression.

* It assumes that the *only* things that a simple plan could get
invalidated for are functions that were inlined. This isn't the
case --- a counterexample is that removal of no-op CoerceToDomain
nodes requires the plan to be invalidated if the domain's constraints
change. And there are likely to be more such issues in future.

So while there's clearly something worth pursuing here, I do not like
anything about the way it was done. I think that the right way to
think about this problem is "how can the plan cache provide a fast
path for checking validity of simple-expression plans?". And when you
think about it that way, there's a pretty obvious answer: if the plan
involves no table references, there's not going to be any locks that
have to be taken before we can check the is_valid flag. So we can
have a fast path that skips AcquirePlannerLocks and
AcquireExecutorLocks, which are a big part of the problem, and we can
also bypass some of the other random checks that GetCachedPlan has to
make, like whether RLS affects the plan.

Another chunk of the issue is the constant acquisition and release of
reference counts on the plan. We can't really skip that (I suspect
there are additional bugs in Amit's patch arising from trying to do so).
However, plpgsql already has mechanisms for paying simple-expression
setup costs once per transaction rather than once per expression use.
So we can set up a simple-expression ResourceOwner managed much like
the simple-expression EState, and have it hold a refcount on the
CachedPlan for each simple expression, and pay that overhead just once
per transaction.

So I worked on those ideas for awhile, and came up with the attached
patchset:

0001 adds some regression tests in this area (Amit's patch fails the
tests concerning search_path changes).

0002 does what's suggested above. I also did a little bit of marginal
micro-tuning in exec_eval_simple_expr() itself.

0003 improves the biggest remaining cost of validity rechecking,
which is verifying that the search_path is the same as it was when
the plan was cached.

I haven't done any serious performance testing on this, but it gives
circa 2X speedup on Pavel's original example, which is at least
fairly close to the results that Amit's patch got there. And it
makes this last batch of test cases faster not slower, too.

With this patch, perf shows the hotspots on Pavel's original example
as being

+ 19.24% 19.17% 46470 postmaster plpgsql.so [.] exec_eval_expr
+ 15.19% 15.15% 36720 postmaster plpgsql.so [.] plpgsql_param_eval_var
+ 14.98% 14.94% 36213 postmaster postgres [.] ExecInterpExpr
+ 6.32% 6.30% 15262 postmaster plpgsql.so [.] exec_stmt
+ 6.08% 6.06% 14681 postmaster plpgsql.so [.] exec_assign_value

Maybe there's more that could be done to knock fat out of
exec_eval_expr and/or plpgsql_param_eval_var, but at least
the plan cache isn't the bottleneck anymore.

regards, tom lane

Attachment Content-Type Size
0001-add-test-cases-1.patch text/x-diff 4.1 KB
0002-faster-revalidation-1.patch text/x-diff 28.3 KB
0003-faster-searchpath-1.patch text/x-diff 8.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-03-21 18:51:05 Re: Ecpg dependency
Previous Message Bruce Momjian 2020-03-21 18:14:44 Re: Ecpg dependency