Tightening behaviour for non-immutable behaviour in immutable functions

From: Greg Stark <stark(at)mit(dot)edu>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Tightening behaviour for non-immutable behaviour in immutable functions
Date: 2022-06-08 21:42:48
Message-ID: CAM-w4HO72ec5PwabFaAGE3cvm=ujcCsJJeCTV2uwZzf7KyY90w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Recently I had someone complaining about a pg_restore failure and I
believe we semi-regularly get complaints that are similar -- though
I'm having trouble searching for them because the keywords "dump
restore failure" are pretty generic.

The root cause here -- and I believe for a lot of users -- are
functions that are declared IMMUTABLE but are not for reasons that
aren't obvious to the user. Indeed poking at this more carefully I
think it's downright challenging to write an IMMUTABLE function at
all. I suspect *most* users, perhaps even nearly all users, who write
functions intending them to be immutable are actually not really as
successful as they believe.

The biggest culprit is of course search_path. Afaics it's nigh
impossible to write any non-trivial immutable function without just
setting the search_path GUC on the function. And there's nothing
Postgres that requires that. I don't even see anything in the docs
recommending it.

Many users probably always run with the same search_path so in
practice they're probably mostly safe. But one day they could insert
data with a different search path with a different function definition
in their path and corrupt their index which would be.... poor... Or as
in my user they could discover the problem only in the middle of an
upgrade which is a terrible time to discover it.

I would suggest we should probably at the very least warn users if
they create an immutable function that doesn't have search_path set
for the function. I would actually prefer to make it an error but that
brings in compatibility issues. Perhaps it could be optional.

But putting a GUC on a function imposes a pretty heavy performance
cost. I'm not sure how bad it is compared to running plpgsql code let
alone other languages but IIUC it invalidates some catalog caches
which for something happening repeatedly in, e.g. a data load would be
pretty bad.

It would be nice to have a way to avoid the performance cost and I see
two options.

Thinking of plpgsql here, we already run the raw parser on all sql
when the function is defined. We could somehow check whether the
raw_parser found any non-schema-qualified references. This looks like
it would be awkward but doable. That would allow users to write
non-search_path-dependent code and if postgres doesn't warn they would
know they've done it properly. It would still put quite a burden on
users, especially when it comes to operators...

Or alternatively we could offer lexical scoping so that all objects
are looked up at parse time and the fully qualified reference is
stored instead of the non-qualified reference. That would be more
similar to how views and other object references are handled.

I suppose there's a third option that we could provide something which
instead of *setting* the guc when a function is entered just verifies
that guc is set as expected. That way the function would simply throw
an error if search_path is "incorrect" and not have to invalidate any
caches. That would at least avoid index corruption but not guarantee
dump/reload would work.

--
greg

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-06-08 21:55:27 Re: BTMaxItemSize seems to be subtly incorrect
Previous Message Roberto C. Sánchez 2022-06-08 21:31:09 Re: Request for assistance to backport CVE-2022-1552 fixes to 9.6 and 9.4