Faster "SET search_path"

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Faster "SET search_path"
Date: 2023-07-29 15:59:01
Message-ID: 04c8592dbd694e4114a3ed87139a7a04e4363030.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Improve performance of "SET search_path".

Motivation:

Creating functions with a "SET search_path" clause is safer and more
secure because the function behavior doesn't change based on the
caller's search_path setting.

Setting search_path in the function declaration is especially important
for SECURITY DEFINER functions[1], but even SECURITY INVOKER functions
can be executed more like SECURITY DEFINER in some contexts (e.g.
REINDEX executing an index function). Also, it's just error-prone to
depend on the caller's search_path unless there's a specific reason you
want to do that.

Unfortunately, adding a "SET search_path" clause to functions slows
them down. The attached patches close the performance gap
substantially.

Changes:

0001: Transform the settings in proconfig into a List for faster
processing. This is simple and speeds up any proconfig setting.

0002: Introduce CheckIdentifierString(), which is a faster version of
SplitIdentifierString() that only validates, and can be used in
check_search_path().

0003: Cache of previous search_path settings. The key is the raw
namespace_search_path string and the role OID, and it caches the
computed OID list. Changes to the search_path setting or the role can
retrieve the cached OID list as long as nothing else invalidates the
cache (changes to the temp schema or a syscache invalidation of
pg_namespace or pg_role).

One behavior change in 0003 is that retrieving a cached OID list
doesn't call InvokeNamespaceSearchHook(). It would be easy enough to
disable caching when a hook exists, but I didn't see a reason to expect
that "SET search_path" must invoke that hook each time. Invoking it
when computing for the first time, or after a real invalidation, seemed
fine to me. Feedback on that is welcome.

Test:

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE big(i) AS SELECT generate_series(1,20000000);
VACUUM big; CHECKPOINT;
CREATE FUNCTION inc(int) RETURNS INT
LANGUAGE plpgsql
AS $$ begin return $1+1; end; $$;
CREATE FUNCTION inc_ab(int) RETURNS INT
LANGUAGE plpgsql SET search_path = a, b
AS $$ begin return $1+1; end; $$;

-- baseline
EXPLAIN ANALYZE SELECT inc(i) FROM big;

-- test query
EXPLAIN ANALYZE SELECT inc_ab(i) FROM big;

Results:

baseline: 4.3s
test query:
without patch: 14.7s
0001: 13.6s
0001,0002: 10.4s
0001,0002,0003: 8.6s

Timings were inconsistent for me so I took the middle of three runs.

It's a lot faster than without the patch. It's still 2X worse than not
specifying any search_path (baseline), but I think it brings it into
"usable" territory for more use cases.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY

Attachment Content-Type Size
v1-0001-Transform-proconfig-for-faster-execution.patch text/x-patch 5.2 KB
v1-0002-Optimize-check_search_path.patch text/x-patch 4.3 KB
v1-0003-Add-cache-for-recomputeNamespacePath.patch text/x-patch 12.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-07-29 16:15:10 Re: POC, WIP: OR-clause support for indexes
Previous Message Rui Zhao 2023-07-29 15:10:22 pg_upgrade fails with in-place tablespace