Re: Command Triggers patch v18

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Command Triggers patch v18
Date: 2012-03-29 16:59:51
Message-ID: CA+TgmoYNUUqDCdSLj0H7kim=cHcEDtY37_jRRrRBEc1QD20xFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 29, 2012 at 12:17 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>     create command trigger before COMMAND_STEP of alter table
>>>          execute procedure snitch();
>>
>> One thought is that it might be better to say AT or ON or WHEN rather
>> than BEFORE, since BEFORE END is just a little strange; and also
>> because a future hook point might be something like
>> "permissions-checking", and we really want it to be AT
>> permissions-checking time, not BEFORE permissions-checking time.
>
> Yeah I tried different spellings and almost sent a version using AT or
> WHEN, but it appeared to me not to be specific enough: AT permission
> checking time does not exist, it either happens before or after
> permission checking. I played with using “preceding” rather than before,
> too, maybe you would like that better.

Well, preceding and before are synonyms, so I don't see any advantage
in that change. But I really did mean AT permissions_checking time,
not before or after it. That is, we'd have a hook where instead of
doing something like this:

aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT);

...we'd call a superuser-supplied trigger function and let it make the
call. This requires a clear separation of permissions and integrity
checking that we are currently lacking, and probably quite a bit of
other engineering too, but I think we should assume that we're going
to do it at some point because there is it seems pretty clear that
there is a huge amount of pent-up demand for being able to do things
like this.

>> If I got to pick, I'd pick this syntax:
>>
>> CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
>> EXECUTE PROCEDURE function_name(args);
>>
>> Then you could eventually imagine:
>>
>> CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE
>> PROCEDURE throw_an_error();
>> CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start
>> (alter_table) EXECUTE PROCEDURE throw_an_error();
>> CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE
>> PROCEDURE record_table_drop();
>> CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check
>> (create_extension) EXECUTE PROCEDURE make_heroku_happy();
>> CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start
>> (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand();
>
> I would amend that syntax to allow for a WHEN <expr> much like in the
> DML trigger case, where the expression can play with the variables
> exposed at the time of calling the trigger.
>
>  create event trigger prohibit_some_ddl
>     preceding <timing spec>
>          when tag in ('CREATE TABLE', 'ALTER TABLE')
>       execute procedure throw_an_error();

We could do it that way, but the tag in ('CREATE TABLE', 'ALTER
TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to
need to start up and shut down the executor to figure out whether the
trigger has to fire. If we are going to do it that way then we need
to carefully define what variables are available and what values they
get. I think that most people's event-filtering needs will be simple
enough to be served by a more declarative syntax.

> We must also take some time to describe the timing specs carefully,
> their order of operation and which information are available for
> triggers in each of them. See below.
>
> I also don't like the way you squeeze in the drop cascade support here
> by re qualifying it as a timing spec, which it's clearly not.
>
> On subcommand_start, what is the tag set to? the main command or the
> subcommand? if the former, where do you find the current subcommand tag?
> I don't think subcommand_start should be a timing spec either.

That's up for grabs, but I was thinking the subcommand. For example, consider:

alter table t alter column a add column b int, disable trigger all;

I would imagine this having a firing sequence something like this:

ddl_command_start:alter_table
ddl_command_permissions_check:alter_table
ddl_command_name_lookup:alter_table
alter_table_subcommand_prep:add_column
alter_table_subcommand_prep:disable_trigger_all
alter_table_subcommand_start:add_column
sql_create:column
alter_table_subcommand_end:add_column
alter_table_subcommand_start:disable_trigger_all
alter_table_subcommand_end:disable_trigger_all
ddl_command_end:alter_table

Lots of room for argument there, of course.

>> Piece by piece:
>>
>> - I prefer EVENT to COMMAND, because the start or end or any other
>> point during the execution of a command can be viewed as an event;
>> however, it is pretty clear that not everything that can be viewed as
>> an event upon which we might like triggers to fire can be viewed as a
>> command, even if we have a somewhat well-defined notion of
>> subcommands.  I continue to think that there is no sense in which
>> creating a sequence to back a serial column or dropping an object due
>> to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an
>> event.  Anything we want can be an event.  I think if we don't get
>> this right in the first version we're going to be stuck with a
>> misnomer forever.  :-(
>
> I can buy naming what we're building here EVENT TRIGGERs, albeit with
> some caveats: what I implemented here only caters for commands for which
> we have a distinct parse tree. That's what sub commands are in effect
> and why it supports create schema sub commands but not all alter table
> declination, and not cascaded drops.
>
> Also, implementing cascade operations as calling process utility with a
> complete parse tree is not going to be a little project on its own. We
> can also punt that and document that triggers fired on cascaded drops
> are not receiving a parse tree, only the command tag and object id and
> object name and schema name, or even just the command tag and object id
> in order not to incur too many additional name lookups.

Yeah, I would not try to pass the parse tree to every event trigger,
just the ones where it's well-defined. For drops, I would just say,
hey, this is what we dropped. The user can log it or ereport, but
that's about it. Still, that's clearly enough to do a lot of really
interesting stuff, replication being the most obvious example.

> Lastly, EVENT opens the road to thinking about transaction event
> triggers, on begin|commit|rollback etc. Do we want to take that road?

I wouldn't rule it out. If we can make it work, I think it'd make a
lot of people happy. Not all combinations are practical, in all
likelihood, like I don't think you can have a trigger that fires
before you begin the transaction, because you need an error-handling
context before you can do anything. But you might be able to have
post-begin and pre-commit triggers.

> Well that's a very little burden considering that you will need to add
> support for firing the triggers at the right time with the right piece
> of information provided, but why not. I'm also not convinced that the
> complexity of implementing proper error management is worth simplifying
> the parser too much.

Fair point. My main motivation is really that I'd like to have a
syntax that doesn't presume very much about what sorts of firing
points we might have. For many purposes, it makes sense to key on
which command is executing, but for some it might make sense to cue on
subcommand, type of SQL object affected, or something else entirely.
If we can leave the door open to such use cases, I think we will be
happier, because I bet they will come along in time if not right now,
and it will make everything simpler if we haven't painted ourselves
into a corner syntax-wise.

[..snip..]

> So we seem to have:
>
>  1. start
>  2. permissions_check
>  3. consistency_check
>  4. acquire_lock
>  5. execute, for lack of a better term
>  6. end
>
> The main problem with that scheme is that while it works fine for simple
> commands, complex one will have to do lookup and take locks in the
> middle of the permission and consistency checks, and I don't think we're
> going to be able to change that.

I think that you've got that a bit mixed up: consistency checks
typically happen after acquiring locks, not before, because until
you've locked any consistency checks you do are so much rubbish, as
the situation can change under you at any time. Also, ALL commands
need to do lookups and take locks in the middle of permissions checks;
any that don't are buggy. SOME commands also need to do such things
during execution, as in the case of ALTER TABLE .. ADD FOREIGN KEY,
which locks and checks the main table first and then eventually gets
around to checking the other table.

> One possible answer is to say that for some commands, they all are in
> fact the same event and your event triggers will in fact get run from
> the same place in the command implementation, but still in that order.

Yes, I think there may be cases where it works out that way. Or
possibly some events aren't defined for some command types at all, or
a little of both. I think we can and should do whatever makes the
behavior least surprising from a user perspective.

> So we can't guarantee at large that a consistency_check trigger is
> always run before we did name lookups and acquired locks.

Is this a typo, can't for can? If not, I'm confused, because it seems
like you were saying we could and should guarantee that, and I agree
that we can and should.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-03-29 17:03:41 Re: ECPG FETCH readahead
Previous Message Tom Lane 2012-03-29 16:57:05 Re: pgxs and bison, flex