Re: Command Triggers patch v18

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Command Triggers patch v18
Date: 2012-03-29 22:16:44
Message-ID: CA+TgmoZPbo+shz=H4U8K-siyKfxSAZVtM27PV4L_T5Q+0O4n6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 29, 2012 at 4:21 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Well, preceding and before are synonyms, so I don't see any advantage
>> in that change.  But I really did mean AT permissions_checking time,
>> not before or after it.  That is, we'd have a hook where instead of
>> doing something like this:
>>
>> aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT);
>>
>> ...we'd call a superuser-supplied trigger function and let it make the
>
> Wow. I don't think I like that at all. It's indeed powerful, but how do
> you go explaining and fixing unanticipated behavior with such things in
> production? It looks too much like an invitation to break a very careful
> design where each facility has to rove itself to get in.

I'm thinking of things like extension whitelisting. When some
unprivileged user says "CREATE EXTENSION harmless", and harmless is
marked as superuser-only, we might like to have a hook that gets
called *at permissions-checking time* and gets to say, oh, well, that
extension is on the white-list, so we're going to allow it. I think
you can come up with similar cases for other commands, where in
general the operation is restricted to superusers or database owners
or table owners but in specific cases you want to allow others to do
it.

>>>> CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
>>>> EXECUTE PROCEDURE function_name(args);
>>>
>>> I would amend that syntax to allow for a WHEN <expr> much like in the
>>> DML trigger case, where the expression can play with the variables
>>> exposed at the time of calling the trigger.
>>>
>>>  create event trigger prohibit_some_ddl
>>>     preceding <timing spec>
>>>          when tag in ('CREATE TABLE', 'ALTER TABLE')
>>>       execute procedure throw_an_error();
>>
>> We could do it that way, but the tag in ('CREATE TABLE', 'ALTER
>> TABLE') syntax will be tricky to evaluate.  I'd really prefer NOT to
>> need to start up and shut down the executor to figure out whether the
>> trigger has to fire.  If we are going to do it that way then we need
>> to carefully define what variables are available and what values they
>> get.  I think that most people's event-filtering needs will be simple
>> enough to be served by a more declarative syntax.
>
> We could then maybe restrict that idea so that the syntax is more like
>
>  when <trigger variable> in (<literal>[, ...])
>
> So that we just have to store the array of strings and support only one
> operation there. We might want to go as far as special casing the object
> id to store an oid vector there rather than a text array, but we could
> also decide not to support per-oid command triggers in the first
> release and remove that from the list of accepted <trigger variable>s.

I guess that would make sense if you think there would ever be more
than one choice for <trigger variable>. I'm not immediately seeing a
use case for that, though - I was explicitly viewing the syntax foo
(bar, baz) to mean foo when
<the-only-trigger-variable-that-makes-sense-given-that-we-are-talking-about-a-trigger-on-foo>
in (bar, baz).

More generally, my thought on the structure of this is that you're
going to have certain toplevel events, many of which will happen at
only a single place in the code, like "an object got dropped" or "a
DDL command started" or "a DDL command ended". So we give those
names, like sql_drop, ddl_command_start, and ddl_command_end. Inside
your trigger procedure, the set of magic variables that is available
will depend on which toplevel event you set the trigger on, but
hopefully all firings of that toplevel event can provide the same
magic variables. For example, at ddl_command_start time, you're just
gonna get the command tag, but at ddl_command_end time you will get
the command tag plus maybe some other stuff.

Now, we COULD stop there. I mean, we could document that you can
create a trigger on ddl_command_start and every DDL command will fire
that trigger, and if the trigger doesn't care about some DDL
operations, then it can look at the command tag and return without
doing anything for the operations it doesn't care about. The only
real disadvantage of doing it that way is speed, and maybe a bit of
code complexity within the trigger. So my further thought was that
we'd allow you to specify an optional filter list to restrict which
events would fire the trigger, and the exact meaning of that filter
list would vary depending on the toplevel event you've chosen - i.e.
for ddl_command_start, the filter would be a list of commands, but for
sql_drop it would be a list of object types. We could make that more
complicated if we think that an individual toplevel event will need
more than one kind of filtering. For example, if you wanted to filter
sql_drop events based on the object type AND/OR the schema name, then
the syntax I proposed would obviously not be adequate. I'm just not
convinced we need that, especially because you'd then need to set up a
dependency between the command trigger and the schema, since obviously
the command trigger must be dropped if the schema is. Maybe there are
better examples; I just can't think of any.

>> Fair point.  My main motivation is really that I'd like to have a
>> syntax that doesn't presume very much about what sorts of firing
>> points we might have.  For many purposes, it makes sense to key on
>
> IIUC the technique we're using in the EXPLAIN options case, it's easy to
> stay flexible as long as we're providing only single words here, or at
> least a simple “parser unit” thing. Hence the use of underscores in the
> timing spec names.

Yes.

>> need to do lookups and take locks in the middle of permissions checks;
>> any that don't are buggy.  SOME commands also need to do such things
>> during execution, as in the case of ALTER TABLE .. ADD FOREIGN KEY,
>> which locks and checks the main table first and then eventually gets
>> around to checking the other table.
>
> The namespace is often resolved first, sometime even before permission
> checks are done. See CreateConversionCommand() for an example of that,
> but really that happens about everywhere IIRC.

Ah! I was thinking of ALTER commands, but you're right: CREATE
commands are different. In many ways a CREATE command is more like an
operation on the schema than it is an operation on the object itself
(since that doesn't exist yet). Those all need to have permissions
checks and locking *on the schema* intertwined with the schema lookup,
as RangeVarGetAndCheckCreationNamespace() does. Unfortunately, the
equivalent code for non-relations is broken, and I didn't have time to
fix it in time for 9.2. I think that would be a good thing to get
done for 9.3, though: it's only about 2 days worth of work, I think.

>>> One possible answer is to say that for some commands, they all are in
>>> fact the same event and your event triggers will in fact get run from
>>> the same place in the command implementation, but still in that order.
>>
>> Yes, I think there may be cases where it works out that way.  Or
>> possibly some events aren't defined for some command types at all, or
>> a little of both.  I think we can and should do whatever makes the
>> behavior least surprising from a user perspective.
>
> Good, agreed. That will make it into a nice documentation table.

Yeah. This might even be a case where we should write the
documentation first and then make the implementation match it, rather
than the other way around.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2012-03-29 22:48:07 Re: ECPG FETCH readahead
Previous Message Tom Lane 2012-03-29 21:56:50 Re: Speed dblink using alternate libpq tuple storage