Re: plan cache overhead on plpgsql expression

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plan cache overhead on plpgsql expression
Date: 2020-03-21 20:29:21
Message-ID: CAFj8pRDCJcv6RwrVdceQVx_Ct0uxADmZN51nv=Z9Z0QBy68K+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

so 21. 3. 2020 v 19:24 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > So the patch has a problem with constant casting - unfortunately the mix
> of
> > double precision variables and numeric constants is pretty often in
> > Postgres.
>
> Yeah. I believe the cause of that is that the patch thinks it can skip
> passing an inline-function-free simple expression through the planner.
> That's flat out wrong. Quite aside from failing to perform
> constant-folding (which is presumably the cause of the slowdown you
> spotted), that means that we miss performing such non-optional
> transformations as rearranging named-function-argument notation into
> positional order. I didn't bother to test that but I'm sure it can be
> shown to lead to crashes.
>
> Now that I've looked at the patch I don't like it one bit from a
> structural standpoint either. It's basically trying to make an end
> run around the plancache, which is not going to be maintainable even
> if it correctly accounted for everything the plancache does today.
> Which it doesn't. Two big problems are:
>
> * It doesn't account for the possibility of search_path changes
> affecting the interpretation of an expression.
>
> * It assumes that the *only* things that a simple plan could get
> invalidated for are functions that were inlined. This isn't the
> case --- a counterexample is that removal of no-op CoerceToDomain
> nodes requires the plan to be invalidated if the domain's constraints
> change. And there are likely to be more such issues in future.
>
>
> So while there's clearly something worth pursuing here, I do not like
> anything about the way it was done. I think that the right way to
> think about this problem is "how can the plan cache provide a fast
> path for checking validity of simple-expression plans?". And when you
> think about it that way, there's a pretty obvious answer: if the plan
> involves no table references, there's not going to be any locks that
> have to be taken before we can check the is_valid flag. So we can
> have a fast path that skips AcquirePlannerLocks and
> AcquireExecutorLocks, which are a big part of the problem, and we can
> also bypass some of the other random checks that GetCachedPlan has to
> make, like whether RLS affects the plan.
>
> Another chunk of the issue is the constant acquisition and release of
> reference counts on the plan. We can't really skip that (I suspect
> there are additional bugs in Amit's patch arising from trying to do so).
> However, plpgsql already has mechanisms for paying simple-expression
> setup costs once per transaction rather than once per expression use.
> So we can set up a simple-expression ResourceOwner managed much like
> the simple-expression EState, and have it hold a refcount on the
> CachedPlan for each simple expression, and pay that overhead just once
> per transaction.
>
> So I worked on those ideas for awhile, and came up with the attached
> patchset:
>
> 0001 adds some regression tests in this area (Amit's patch fails the
> tests concerning search_path changes).
>
> 0002 does what's suggested above. I also did a little bit of marginal
> micro-tuning in exec_eval_simple_expr() itself.
>
> 0003 improves the biggest remaining cost of validity rechecking,
> which is verifying that the search_path is the same as it was when
> the plan was cached.
>
> I haven't done any serious performance testing on this, but it gives
> circa 2X speedup on Pavel's original example, which is at least
> fairly close to the results that Amit's patch got there. And it
> makes this last batch of test cases faster not slower, too.
>
> With this patch, perf shows the hotspots on Pavel's original example
> as being
>
> + 19.24% 19.17% 46470 postmaster plpgsql.so
> [.] exec_eval_expr
> + 15.19% 15.15% 36720 postmaster plpgsql.so
> [.] plpgsql_param_eval_var
> + 14.98% 14.94% 36213 postmaster postgres
> [.] ExecInterpExpr
> + 6.32% 6.30% 15262 postmaster plpgsql.so
> [.] exec_stmt
> + 6.08% 6.06% 14681 postmaster plpgsql.so
> [.] exec_assign_value
>
> Maybe there's more that could be done to knock fat out of
> exec_eval_expr and/or plpgsql_param_eval_var, but at least
> the plan cache isn't the bottleneck anymore.
>

I tested Tom's patches, and I can confirm these results.

It doesn't break tests (and all tests plpgsql_check tests passed without
problems).

The high overhead of ExecInterpExpr is related to prepare fcinfo, and
checking nulls arguments because all functions are strict
plpgsql_param_eval_var, looks like expensive is var = (PLpgSQL_var *)
estate->datums[dno] and *op->resvalue = var->value;

It looks great.

Pavel

>
> regards, tom lane
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-03-21 21:59:46 Re: Additional improvements to extended statistics
Previous Message Pavel Stehule 2020-03-21 19:38:31 Re: SQL/JSON: functions