Re: Command Triggers patch v18

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject: Re: Command Triggers patch v18
Date: 2012-03-27 14:29:58
Message-ID: CA+TgmobzaBHnBnzCcbbtpKyw+v_7n2MPU8wsReYkHLw-Lw64mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2012 at 9:37 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Not necessarily. One use-case is doing something *before* something happens
> like enforcing naming conventions, data types et al during development/schema
> migration.

That is definitely a valid use case. The only reason why we don't
have something like that built into the ObjectAccessHook framework is
because we haven't yet figured out a clean way to make it work. Most
of KaiGai's previous attempts involved passing a bunch of garbage
selected apparently at random into the hook function, and I rejected
that as unmaintainable. Dimitri's code here doesn't have that problem
- it passes in a consistent set of information across the board. But
I still think it's unmaintainable, because there's no consistency
about when triggers get invoked, or whether they get invoked at all.
We need something that combines the strengths of both approaches.

I actually think that, to really meet all needs here, we may need the
ability to get control in more than one place. For example, one thing
that KaiGai has wanted to do (and I bet Dimitri would live to be able
to do it too, and I'm almost sure that Dan Farina would like to do it)
is override the built-in security policy for particular commands. I
think that KaiGai only wants to be able to deny things that would
normally be allowed, but I suspect some of those other folks would
also like to be able to allow things that would normally be denied.
For those use cases, you want to get control at permissions-checking
time. However, where Dimitri has the hooks right now, BEFORE triggers
for ALTER commands fire after you've already looked up the object that
you're manipulating. That has the advantage of allowing you to use
the OID of the object, rather than its name, to make policy decisions;
but by that time it's too late to cancel a denial-of-access by the
first-line permissions checks. Dimitri also mentioned wanting to be
able to cancel the actual operation - and presumably, do something
else instead, like go execute it on a different node, and I think
that's another valid use case for this kind of trigger. It's going to
take some work, though, to figure out what the right set of control
points is, and it's probably going to require some refactoring of the
existing code, which is a mess that I have been slowly trying to clean
up.

>> In the
>> interest of getting event triggers, you're arguing that we should
>> contort the definition of the work "command" to include sub-commands,
>> but not necessarily commands that turn out to be a no-op, and maybe
>> things that are sort of like what commands do even though nobody
>> actually ever executed a command by that name.  I just don't think
>> that's a good idea.  We either have triggers on commands, and they
>> execute once per command, period; or we have triggers on events and
>> they execute every time that event happens.
> I don't think thats a very helpfull definition. What on earth would you want to
> do with plain passing of
> CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar();
>
> So some decomposition seems to be necessary. Getting the level right sure
> ain't totally easy.
> It might be helpful to pass in the context from which something like that
> happened.

I agree that it's not a very helpful decision, but I'm not the one who
said we wanted command triggers rather than event triggers. :-)

> Which would basically require including pg_dump in the backend to implement
> replication and logging. I don't think librarifying pg_dump is a fair burden
> at all.

I don't either, but that strikes me as a largely unrelated problem.
As you may recall, I've complained that too much of that functionality
is in the client in the past, and I haven't changed my opinion on
that. But how is that any different with Dimitri's approach? You can
get a callback AFTER CREATE TABLE, and you'll get the table name. Now
what? If you get the trigger in C you can get the node tree, but
that's hardly any better. You're still going to need to do some
pretty tricky push-ups to get reliable replication. It's not at all
evident to me that the parse-tree is any better a place to start than
the system catalog representation; in fact, I would argue that it's
probably much worse, because you'll have to exactly replicate whatever
the backend did to decide what catalog entries to create, or you'll
get drift between servers.

> Also I have a *very hard* time to imagining to sensibly implement replicating
> CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object access hooks.
> There is just not enough context. How would you discern between a ADD COLUMN
> and a CREATE TABLE? They look very similar or even identical on a catalog
> level.

That can be fixed using the optional argument to
InvokeObjectAccessHook, similar to what we've done for differentiating
internal drops from other drops.

> I also have the strong feeling that all this would expose implementation
> details *at least* as much as command triggers. A slight change in order of
> catalog modifcation would be *way* harder to hide via the object hook than
> something of a similar scale via command triggers.

I don't see that. Right now, when you do something like CREATE TABLE
foo (a serial primary key), the exact order in which the objects are
created is apparent from the order of the trigger firing. The order
of BEFORE vs. AFTER triggers is also drawn out of a hat. So I don't
see how it's any worse. It also avoids a lot of definitional
ambiguity. If you say that we're going to have a trigger on the
CREATE SEQUENCE command, then what happens when the user creates a
sequence via some other method? The current patch says that we should
handle that by calling the CREATE SEQUENCE trigger if it happens to be
convenient because we're going through the same code path that a
normal CREATE SEQUENCE would have gone through, but if it uses a
different code path then let's not bother. Otherwise, how do you
explain the fact that a DROP .. CASCADE doesn't fire triggers for the
subordinate objects dropped, but CREATE TABLE does fire triggers for
the subordinate objects created? If you trigger on events, then
things are a lot more cut-and-dried. You fire the trigger when the
event happens. If the event doesn't happen, you don't fire the
trigger. If an implementation change causes objects to be created
when they weren't before, or not created when they were before, then,
yes, trigger firing behavior will change, but it will be clear that
the new behavior is correct. Right now, future developers modifying
this code have no reasonable basis for deciding whether or not a
trigger ought to fire in a given situation. There's no consistent
rule, no specification, and no documentation for how that's intended
to happen.

> Hm. I agree that there is some work needed to make the whole handling more
> consistent. But I don't think its as bad as you paint it.
>
> I don't want to brush of your review here - you definitely do raise valid
> points. I don't particularly like the current implementation very much. But I
> simply fail to see a less annoying way.
>
> Its also not that I am just defending a colleague - you might have noticed the
> @2ndquadrant - I got interested in this patch quite a bit before that was on
> the horizon.

Yep, understood, and I'd have the same complaints about this patch
that I do presently if it were submitted by someone who worked for
EnterpriseDB, or some independent third party. I do think it's a bit
of a shame that more people didn't pay attention to and help improve
the ObjectAccessHook machinery as we were working on that. I mean,
Dimitri is not the first or last person to want to get control during
DDL operations, and KaiGai's already done a lot of work figuring out
how to make it work reasonably. Pre-create hooks don't exist in that
machinery not because nobody wants them, but because it's hard. This
whole problem is hard. It would be wrong to paint it as a problem
that is unsolvable or not valuable, but it would be equally wrong to
expect that it's easy or that anyone's first attempt (mine, yours,
Dimitri's, KaiGai's, or Tom Lane's) is going to fall painlessly into
place without anyone needing to sweat a little blood.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tareq Aljabban 2012-03-27 14:33:40 Re: Storage Manager crash at mdwrite()
Previous Message Alex Shulgin 2012-03-27 14:13:55 Re: Another review of URI for libpq, v7 submission