Re: pgsql: Fix search_path to a safe value during maintenance operations.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: pgsql: Fix search_path to a safe value during maintenance operations.
Date: 2023-08-01 17:41:42
Message-ID: CA+Tgmobd=eFRGWHhfG4mG2cA+dsVuA4jpBvD8N1NS=Vc9eHFQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Mon, Jul 31, 2023 at 6:10 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Capturing the environment is not ideal either, in my opinion. It makes
> it easy to carelessly depend on a schema that others might not have
> USAGE privileges on, which would then create a runtime problem for
> other callers. Also, I don't think we could just depend on the raw
> search_path, we'd need to do some processing for $user, and there are
> probably a few other annoyances.
>
> It's one possibility and we don't have a lot of great options, so I
> don't want to rule it out though. If nothing else it could be a
> transition path to something better.

Here is my thought about this. Right now, we basically do one of two
things. In some cases, we parse statements when they're submitted, and
then store the resulting node trees. In such cases, references are
fixed: the statements will always refer to the objects to which they
referred previously. In functions and procedures, except for the new
BEGIN ATOMIC stuff, we just store the statements as a string and they
get parsed at execution time. Then, the problem arises of statements
being possibly parsed in an environment that differs from the original
one. It can differ either by search_path being different so that we
look in different schemas, or, as you point out here, if the contents
of the schemas themselves have been modified.

I think that a lot of people would like it if we moved more in the
direction of parsing statements at object definition time. Possibly
because EDB deals with a lot of people coming from Oracle, I've heard
a lot of complaints about the PG behavior. However, there are some
fairly significant problems with that idea. First, it would break some
use cases, such as creating a temporary table and then running DML
commands on it, or more generally any use case where a function or
procedure might need to reference objects that don't exist at time of
definition. Second, while we have clear separation of parsing and
execution for queries, the same is not true for DDL commands; it's not
the case, I believe, that you can parse an arbitrary DDL command such
that all object references are resolved, and then later execute it.
We'd need to change a bunch of code to get there. Third, we'd have to
deal with dependency loops: right now, because functions and
procedures don't parse their bodies at definition time, they also
don't depend on the objects that they are going to end up accessing,
which means that a function or procedure can be restored by pg_dump
without worrying about whether those objects exist yet. That would
have to change, and that would mean creating dependency loops for
pg_dump, which we'd have to then find a way to break. I'm not trying
to say that any of these problems are intractable, but I do think
changing stuff like this would be quite a bit of work -- and that's
assuming the user impact was judged to be acceptable, which I'm not at
all sure that it would be. We'd certainly need to provide some
workaround for people who want to do stuff like create and use
temporary tables inside of a function or procedure.

Now, if we don't go in the direction of resolving everything at parse
time, then I think capturing search_path is probably the next best
thing, or at least the next best thing that I've thought up so far. It
doesn't hold constant the meaning of the code to the same degree that
parsing at definition time would do, but it gets us closer to that
than the status quo. Crucially, if the user is using a secure
search_path, then any changes to the meaning of the code that captures
that search_path will have to be made by that user or someone with a
superset of their privileges, which is a lot less serious than what
you get when there's no search_path setting at all, where the *caller*
can change the meaning of the called code. That is not, however, to
say that this idea is really good enough. To be honest, I think it's a
bit of a kludge, and dropping a kludge on top of our entire user base
and maybe also breaking a lot of things is not particularly where I
want to be. I just don't have an idea I like better at the moment.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2023-08-01 17:56:46 pgsql: Add and use symbolic constants for tar header offsets and file t
Previous Message Robert Haas 2023-08-01 14:51:10 Re: pgsql: Fix search_path to a safe value during maintenance operations.

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-08-01 17:57:02 Re: constants for tar header offsets
Previous Message Jacob Champion 2023-08-01 17:02:12 Re: New Table Access Methods for Multi and Single Inserts