Re: Command Triggers patch v18

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Command Triggers patch v18
Date: 2012-03-30 14:11:00
Message-ID: CA+TgmoYZGsk7uS0iM0UamzqRMXCQkQW44S5B3EkDTE40A=ZuoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 30, 2012 at 4:32 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> I did that another way in previous incarnations of the patch, which was
> to allow for INSTEAD OF event trigger backed by a SECURITY DEFINER
> function. When the extension is whitelisted, prevent against recursion
> then CREATE EXTENSION in the security definer function, then signal that
> the execution should now be aborted.
>
> That was too dangerous given the lack of policy about where exactly the
> user code is fired, but I think we could now implement that for some of
> the event timing specs we're listing. Only some of them, I guess only
> those that are happening before we lock the objects.

Oh, right: I remember that now. I still think it's a bad way to do
it, because the trigger potentially has a lot of work to do to
reconstruct a working command string, and it still ends up getting
executed by the wrong user. For CREATE EXTENSION it's not that bad,
because the arguments to the command are so simple, but of course any
time we extend the CREATE EXTENSION syntax, the trigger needs to know
about it too whether it's security-relevant or not, and doing
something similar with, say, ALTER TABLE would be a ridiculously
complicated. I think there is a use case for what you called an
INSTEAD OF trigger, but I don't believe in this one. It seems to me
that there's a lot of power in being able to *just* intercept the
security decision and then let the rest of the command go about its
business. Of course, you have to avoid getting security checks (like,
you must own the table in order to drop a column) with integrity
checks (like, you can't drop a column from pg_class) but I think
that's not very hard to get right.

>> More generally, my thought on the structure of this is that you're
>> going to have certain toplevel events, many of which will happen at
>> only a single place in the code, like "an object got dropped" or "a
>> DDL command started" or "a DDL command ended".  So we give those
>> names, like sql_drop, ddl_command_start, and ddl_command_end.  Inside
>
> I really dislike mixing sql_drop and ddl_command_start as being the same
> kind of objects here, even if I can bend my head in the right angle and
> see that it's a fair view when looking at how it's implemented. I can't
> see a way to explain that to users without having to explain them how
> drop cascade is implemented.
>
> So my proposal here is to “fake” a “proper“ subcommand thanks to the new
> context variable. If you DROP TYPE foo CASCADE and that in turn drops a
> function foo_in(), then an event trigger is fired with
>
>   context = 'DROP TYPE'
>   tag = 'DROP FUNCTION'
>
> Same idea when you DROP TABLE … CASCADE and a SEQUENCE and a bunch of
> index need to disappear too, you get an usual event trigger fired with
> the context set to 'DROP TABLE' this time.
>
> I don't think we need to arrange for explicitly publishing the context
> specific information here. If we need to, we have to find the right
> timing spec where we can guarantee still being in the top level command
> and where we already have the details filled in, then users can attach a
> trigger here and register the information for themselves.

I'm not sure I understand how you're using the words "context" and
"tag". I think for a drop trigger I would want the function to
receive this information: type of object dropped, OID of object
dropped, column number in the case of a column drop, flag indicating
whether it's a toplevel drop or a cascaded drop. I wouldn't object to
also making the currently-in-context toplevel command tag available,
but I think most drop triggers wouldn't really care, so I wouldn't
personally spend much implementation effort on it if it turns out to
be hard.

But in general, I don't really know what a "proper" subcommand is or
why some subcommands should be more proper than others, or why we
should even be concerned about whether something is a subcommand at
all. I think it's fine and useful to have triggers that fire in those
kinds of places, but I don't see why we should limit ourselves to
that. For applications like replication, auditing, and enhanced
security, the parse tree and subcommand/non-subcommand status of a
particular operation are irrelevant. What you need is an exact
description of the operation that got performed (e.g. the default on
table X column Y got dropped); you might be able to reverse-engineer
that from the parse tree, but it's much better to have the system pass
you the information you need more directly. Certainly, there are
cases where you might want to have the parse tree, or even the raw
command text, available, but I'm not even convinced that that those
cases will be the most commonly used ones unless, of course, they're
the only ones we offer, in which case everyone will go down that path
by necessity.

>> your trigger procedure, the set of magic variables that is available
>> will depend on which toplevel event you set the trigger on, but
>> hopefully all firings of that toplevel event can provide the same
>> magic variables.  For example, at ddl_command_start time, you're just
>> gonna get the command tag, but at ddl_command_end time you will get
>> the command tag plus maybe some other stuff.
>
> With my proposal above, you could get the same set of information when
> being called as a toplevel event or a subevent (one where the context is
> not null). That would mean adding object name and schema name lokkups in
> the drop cascade code, though. We can also decide not to do that extra
> lookup and just publish the object id which we certainly do have.
>
> This way, the timing spec of a sub-event can still be of the same kind
> as the top-level event ones, we still have before and after lock entry
> points, same with lookup if we add that feature, etc.

Again, I'm not understanding the distinction between toplevel events
and sub-events. I don't see any need for such a distinction. I think
there are just events, and some of them happen at command start/end
and others happen somewhere in the middle. As long as it's a safe and
useful place to fire a trigger, who cares?

> Given the scope of this mini expression language, we can easily bypass
> calling the executor in v1 here, and reconsider later if we want to
> allow calling a UDF in the WHEN clause… I don't think it's an easy
> feature to add in, though.

Or a necessary one. AFAICS, the main benefit of WHEN clauses on
regular triggers is that you can prevent the AFTER trigger queue from
getting huge, and maybe a save a little on the cost of invoking a
trigger function just to exit again. But neither of those should be
relevant here - nobody does that much DDL, and anybody writing command
triggers should understand that this is advanced magic not intended
for beginners. Wizards below level 10 need not apply.

BTW, in reference to your other email, the reason I suggested
ddl_command_start rather than just command_start is because we already
know that there are cases here we'll never be able to handle, like
before-begin. I think it's better to be explicit about what the scope
of the feature is (i.e. DDL) rather than just calling it command_start
and, oh, by the way, the commands we either don't know how to support
or didn't get around to supporting yet aren't included. The latter
approach is basically a guaranteed compatibility break every time we
get around to adding a few more commands, and I'm not keen on that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arun Chaitanya 2012-03-30 14:12:22 Re: Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Previous Message Merlin Moncure 2012-03-30 13:57:12 Re: HTTP Frontend? (and a brief thought on materialized views)