Re: Command Triggers patch v18

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Command Triggers patch v18
Date: 2012-04-01 01:46:02
Message-ID: CA+TgmobrNJ9SyBrGwg13h1oP=EzS+bpGRUY2Fa1R0ZwHNRpb_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 30, 2012 at 11:19 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Yeah context is not explicit, we could call that "toplevel": the command
> tag of the command that the user typed. When toplevel is null, the event
> trigger is fired on a command the user sent, when it's not null, the
> trigger is fired on some inner command operation.

How about calling it "command tag"? I think both context and toplevel
are inconsistent with other uses of those terms: context is an
error-reporting field, among other things; and we don't care about the
toplevel command, just the command-tag of the one we're executing -
e.g. if DROP fires a command trigger which invokes CREATE which fires
another command trigger, the inner one is going to get CREATE not
DROP. Or at least so I presume.

>> "tag".  I think for a drop trigger I would want the function to
>> receive this information: type of object dropped, OID of object
>> dropped, column number in the case of a column drop, flag indicating
>> whether it's a toplevel drop or a cascaded drop.  I wouldn't object to
>> also making the currently-in-context toplevel command tag available,
>> but I think most drop triggers wouldn't really care, so I wouldn't
>> personally spend much implementation effort on it if it turns out to
>> be hard.
>
> I'm not sure it would be hard as I'm only seeing a single depth possible
> here, so a single per-backend string static variable would do.

See above example: I am pretty sure you need a stack.

>> But in general, I don't really know what a "proper" subcommand is or
>> why some subcommands should be more proper than others, or why we
>> should even be concerned about whether something is a subcommand at
>> all.  I think it's fine and useful to have triggers that fire in those
>> kinds of places, but I don't see why we should limit ourselves to
>> that.  For applications like replication, auditing, and enhanced
>> security, the parse tree and subcommand/non-subcommand status of a
>> particular operation are irrelevant.  What you need is an exact
>
> Not really. When replicating you could perfectly say that you only
> replicate the toplevel DROP because the replica will also do the cascade
> dance and you might have decided not to replicate all related objects on
> the other side.

I would not want my replication system issuing cascaded drops, because
if the sides don't match it might cascade to something on the remote
side that it doesn't cascade to on the local side, which exceeds my
tolerance for scary behavior.

> The information you need really want not to miss is when only the
> cascaded object is part of the replication, not the main one. That was
> not covered by my previous patch but now we have a way to cover it.

Also true.

>> description of the operation that got performed (e.g. the default on
>> table X column Y got dropped); you might be able to reverse-engineer
>> that from the parse tree, but it's much better to have the system pass
>> you the information you need more directly.  Certainly, there are
>> cases where you might want to have the parse tree, or even the raw
>> command text, available, but I'm not even convinced that that those
>> cases will be the most commonly used ones unless, of course, they're
>> the only ones we offer, in which case everyone will go down that path
>> by necessity.
>
> There are far too many variants and cases of our command to be able to
> extract their parameters in a flat way (a bunch of variables compared to
> a nested description ala json or xml), and I don't think such a flat
> representation is going to be much better than the parse tree.

I strongly disagree. I think we'll find that with the right choice of
hook points, the number of variables that need to be exposed is quite
compact. Indeed, I'd venture to say that needing to pass lots and
lots of information is evidence that you've made a poor choice of hook
point.

> Now, we will later be able to offer a normalized rewritten command
> string from the parse tree to the use, but I don't see us adding support
> for that from cascaded drops, one other reason why I like to expose them
> as sub commands.
>
>> Again, I'm not understanding the distinction between toplevel events
>> and sub-events.  I don't see any need for such a distinction.  I think
>> there are just events, and some of them happen at command start/end
>> and others happen somewhere in the middle.  As long as it's a safe and
>> useful place to fire a trigger, who cares?
>
> I guess you're head is too heavily in the code side of things as opposed
> to the SQL user view point. Maybe my attempt to conciliate both views is
> not appropriate, but I really do think it is.
>
>>> Given the scope of this mini expression language, we can easily bypass
>>> calling the executor in v1 here, and reconsider later if we want to
>>> allow calling a UDF in the WHEN clause… I don't think it's an easy
>>> feature to add in, though.
>>
>> Or a necessary one.  AFAICS, the main benefit of WHEN clauses on
>
> Exactly.
>
>> regular triggers is that you can prevent the AFTER trigger queue from
>> getting huge, and maybe a save a little on the cost of invoking a
>> trigger function just to exit again.  But neither of those should be
>> relevant here - nobody does that much DDL, and anybody writing command
>> triggers should understand that this is advanced magic not intended
>> for beginners.  Wizards below level 10 need not apply.
>
> Here it's only a facility to manage your event trigger code
> organization, and I insisted in having it in the syntax because in the
> event trigger grammar I don't see another place where to stuff which
> command you're targeting. The command tag certainly can not be
> considered as another kind of event.

No, but whether or not you mention it in the CREATE TRIGGER syntax has
nothing to do with whether it's available as a magic parameter inside
the procedure. Those things out to be independent. I imagine that
the stuff that is accessible from inside the trigger will be richer
than what you can do in the trigger syntax itself.

> As much as I like the event timing idea and appreciate that it nicely
> solves some problem that were hairy with the previous BEFORE/AFTER too
> simple syntax, I want to keep it sane.
>
> An event timing is a generic named place in the code where we “hook” the
> execution of one or several use defined function. This place is generic
> because a whole series of command are providing the hooking place, such
> as command_start, command_end, permissions_check, consistency_check, etc
> etc.
>
> I foresee a matrix in the documentation in the form of
>
>
>            timing | command_start | pcheck | check | command_end | …
>  command /        |               |        |       |             | …
>  -----------------+---------------+--------+-------+-------------+----
>   GRANT           |     ✔        |  ✘     |  ✘   |  ✔
>   CREATE TABLE    |     ✔        |  ✔     |  ✔   |  ✔
>   ALTER TABLE     |     ✔        |  ✔     |  ✔   |  ✔
>   …
>
> I don't think GRANT will be there in 9.2 by the way, but it helps seeing
> commands that won't support all of events to express the idea, I guess.

I'm still not sold on the idea of lumping together every command under
a single command_start event. I can't see anyone wanting to hook
anything that broad. Don't forget that we need to document not only
which triggers will fire but also what magic variables they'll get. A
dcl_command_start hook could conceivably get the list of privileges
being granted or revoked, but a general command_start trigger is going
to need a different set of magic variables depending on the actual
command type. I think it might be simpler and more clear to say that
each event type provides these variables, rather than having to
conditionalize it based on the command type.

>> BTW, in reference to your other email, the reason I suggested
>> ddl_command_start rather than just command_start is because we already
>> know that there are cases here we'll never be able to handle, like
>> before-begin.  I think it's better to be explicit about what the scope
>
> I agree with the fact that any given command will not be able to support
> any given timing spec, what we need is document them. That said, I don't
> see any gain in spelling command_start differently depending on whether
> we are applying to some kind of command or some other.
>
>> of the feature is (i.e. DDL) rather than just calling it command_start
>> and, oh, by the way, the commands we either don't know how to support
>> or didn't get around to supporting yet aren't included.  The latter
>> approach is basically a guaranteed compatibility break every time we
>> get around to adding a few more commands, and I'm not keen on that.
>
> I don't buy that argument. What would be the difference between adding
> support for DCL commands and adding a new DDL in term of compatibility
> breakage for event triggers that are firing on all commands? In both
> cases they will get called on non anticipated events, the fact that one
> event is called ddl_command_start and the other dcl_command_start does
> not appear to me to be relevant in the least. I might be missing
> something though.

See above - generally, I think that it's useful for a command trigger
to know that it's being called because of a DDL event, rather than
some command that could be doing anything. Also, I think that wanting
to hook "all DDL commands" is likely to be a useful thing to do, and
without having to explicitly list 50 command names...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-01 03:05:27 Re: measuring lwlock-related latency spikes
Previous Message Robert Haas 2012-04-01 01:29:43 Re: measuring lwlock-related latency spikes