Re: Command Triggers patch v18

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject: Re: Command Triggers patch v18
Date: 2012-03-27 15:58:21
Message-ID: 201203271758.21899.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tuesday, March 27, 2012 04:29:58 PM Robert Haas wrote:
> On Tue, Mar 27, 2012 at 9:37 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > Not necessarily. One use-case is doing something *before* something
> > happens like enforcing naming conventions, data types et al during
> > development/schema migration.
>
> That is definitely a valid use case. The only reason why we don't
> have something like that built into the ObjectAccessHook framework is
> because we haven't yet figured out a clean way to make it work. Most
> of KaiGai's previous attempts involved passing a bunch of garbage
> selected apparently at random into the hook function, and I rejected
> that as unmaintainable. Dimitri's code here doesn't have that problem
> - it passes in a consistent set of information across the board. But
> I still think it's unmaintainable, because there's no consistency
> about when triggers get invoked, or whether they get invoked at all.
> We need something that combines the strengths of both approaches.
Yes.

I still think the likeliest way towards that is defining some behaviour we want
regarding
* naming conflict handling
* locking

And then religiously make sure the patch adheres to that. That might need some
refactoring of existing code (like putting a art of the utility.c handling of
create table into its own function and such), but should be doable.

I think BEFORE command triggers ideally should run
* before permission checks
* before locking
* before internal checks are done (nameing conflicts, type checks and such)

Obviously some things will be caught before that (parse analysis of some
commands) and I think we won't be able to fully stop that, but its not totally
consistent now and while doing some work in the path of this patch seems
sensible it cannot do-over everything wrt this.

Looking up oids and such before calling the trigger doesn't seem to be
problematic from my pov. Command triggers are a superuser only facility,
additional information they gain are no problem.
Thinking about that I think we should enforce command trigger functions to be
security definer functions.

> I actually think that, to really meet all needs here, we may need the
> ability to get control in more than one place.
Not sure what you mean by that. Before/after permission checks, before/after
consistency checks? That sort of thing?

> For example, one thing
> that KaiGai has wanted to do (and I bet Dimitri would live to be able
> to do it too, and I'm almost sure that Dan Farina would like to do it)
> is override the built-in security policy for particular commands.
Dim definitely seems to want that: https://github.com/dimitri/pgextwlist ;)

> I think that KaiGai only wants to be able to deny things that would
> normally be allowed, but I suspect some of those other folks would
> also like to be able to allow things that would normally be denied.
Denying seems to be easier than allowing stuff safely....

> For those use cases, you want to get control at permissions-checking
> time. However, where Dimitri has the hooks right now, BEFORE triggers
> for ALTER commands fire after you've already looked up the object that
> you're manipulating. That has the advantage of allowing you to use
> the OID of the object, rather than its name, to make policy decisions;
> but by that time it's too late to cancel a denial-of-access by the
> first-line permissions checks.
Why? Just throw a access denied exception? Unless its done after the locking
that won't be visible by anything but timing?

Additional granting is more complex though, I am definitely with you there.
That will probably need INSTEAD triggers which I don't see for now. Those will
have their own share of problems.

> Dimitri also mentioned wanting to be able to cancel the actual operation -
> and presumably, do something else instead, like go execute it on a different
> node, and I think that's another valid use case for this kind of trigger.
> It's going to take some work, though, to figure out what the right set of
> control points is, and it's probably going to require some refactoring of
> the existing code, which is a mess that I have been slowly trying to clean
> up.
I commend your bravery...

> >> In the interest of getting event triggers, you're arguing that we should
> >> contort the definition of the work "command" to include sub-commands,
> >> but not necessarily commands that turn out to be a no-op, and maybe
> >> things that are sort of like what commands do even though nobody
> >> actually ever executed a command by that name. I just don't think
> >> that's a good idea. We either have triggers on commands, and they
> >> execute once per command, period; or we have triggers on events and
> >> they execute every time that event happens.
> > I don't think thats a very helpfull definition. What on earth would you
> > want to do with plain passing of
> > CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar();
> > So some decomposition seems to be necessary. Getting the level right sure
> > ain't totally easy.
> > It might be helpful to pass in the context from which something like that
> > happened.
> I agree that it's not a very helpful decision, but I'm not the one who
> said we wanted command triggers rather than event triggers. :-)
Well, but on the other hand what will you do with a "pg_attribute row added"
event. It doesn't even have a oid ;)

> > Which would basically require including pg_dump in the backend to
> > implement replication and logging. I don't think librarifying pg_dump is
> > a fair burden at all.
> I don't either, but that strikes me as a largely unrelated problem.
> As you may recall, I've complained that too much of that functionality
> is in the client in the past, and I haven't changed my opinion on
> that.
I am not really happy with that as well. But if we start putting all that into
this patch we will never get anywhere. I think moving that nearer to the
server in some form is a MAJOR project on its own.
Also I dont see it solving differential stuff at all.

But how would do something like logging the originating statement with the
object access hook machinery?

> But how is that any different with Dimitri's approach? You can
> get a callback AFTER CREATE TABLE, and you'll get the table name. Now
> what? If you get the trigger in C you can get the node tree, but
> that's hardly any better. You're still going to need to do some
> pretty tricky push-ups to get reliable replication. It's not at all
> evident to me that the parse-tree is any better a place to start than
> the system catalog representation; in fact, I would argue that it's
> probably much worse, because you'll have to exactly replicate whatever
> the backend did to decide what catalog entries to create, or you'll
> get drift between servers.
Well, the problem is with just catalog access its just about impossible to
generate differential changes from them. Also the catalog between clusters
*wont be the same* in a large part of the use-cases. Oids will basically never
match.

Matching the intent, which is mostly visible in the parsetree, seems to be a
whole much more helpfull.

I am pretty confident, given time, that I could write a fairly complete
differential-sql generation for the usual parts of DDL from the parsetree alone
in 2-3 days with a C level trigger. I am also very sure that I wouldn't be
able to even remotely get that far with catalog only access.

> > Also I have a *very hard* time to imagining to sensibly implement
> > replicating CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object
> > access hooks. There is just not enough context. How would you discern
> > between a ADD COLUMN and a CREATE TABLE? They look very similar or even
> > identical on a catalog level.
> That can be fixed using the optional argument to
> InvokeObjectAccessHook, similar to what we've done for differentiating
> internal drops from other drops.
Several of the points where those hooks are invoked don't even seem to have
the necessary information for that... Doing that seems to involve passing
heaps of additional state through the backend.

> > I also have the strong feeling that all this would expose implementation
> > details *at least* as much as command triggers. A slight change in order
> > of catalog modifcation would be *way* harder to hide via the object hook
> > than something of a similar scale via command triggers.
> I don't see that. Right now, when you do something like CREATE TABLE
> foo (a serial primary key), the exact order in which the objects are
> created is apparent from the order of the trigger firing. The order
> of BEFORE vs. AFTER triggers is also drawn out of a hat. So I don't
> see how it's any worse. It also avoids a lot of definitional
> ambiguity. If you say that we're going to have a trigger on the
> CREATE SEQUENCE command, then what happens when the user creates a
> sequence via some other method? The current patch says that we should
> handle that by calling the CREATE SEQUENCE trigger if it happens to be
> convenient because we're going through the same code path that a
> normal CREATE SEQUENCE would have gone through, but if it uses a
> different code path then let's not bother.
> Otherwise, how do you explain the fact that a DROP .. CASCADE doesn't fire
> triggers for the subordinate objects dropped, but CREATE TABLE does fire
> triggers for the subordinate objects created?
I complained about that several times. As I recall it dim basically aggreed
but didn't want to do it now because it involve passing more data around.
Didn't convince me much, but I think I could live with it.

Adding proper command trigger calling via that doesn't seem to be too complex
to me.

> If you trigger on events, then things are a lot more cut-and-dried. You fire
> the trigger when the event happens. If the event doesn't happen, you don't
> fire the trigger. If an implementation change causes objects to be created
> when they weren't before, or not created when they were before, then,
> yes, trigger firing behavior will change, but it will be clear that
> the new behavior is correct. Right now, future developers modifying
> this code have no reasonable basis for deciding whether or not a
> trigger ought to fire in a given situation. There's no consistent
> rule, no specification, and no documentation for how that's intended
> to happen.
This stuff definitely needs to be properly specified. And I am sorry that I
didn't point this out earlier. I guess I just looked at too many iterations of
this to still have a big picture.

> > Hm. I agree that there is some work needed to make the whole handling
> > more consistent. But I don't think its as bad as you paint it.
> >
> > I don't want to brush of your review here - you definitely do raise valid
> > points. I don't particularly like the current implementation very much.
> > But I simply fail to see a less annoying way.
> >
> > Its also not that I am just defending a colleague - you might have
> > noticed the @2ndquadrant - I got interested in this patch quite a bit
> > before that was on the horizon.
>
> Yep, understood, and I'd have the same complaints about this patch
> that I do presently if it were submitted by someone who worked for
> EnterpriseDB, or some independent third party.
I didn't want to suggest that you do! I just wanted to make it explicit that
its not somebody asking me via external channels (expect maybe Dim asking for
reviews on #postgresql on irc) to do a round of review for some company
reason.

> I do think it's a bit of a shame that more people didn't pay attention to
> and help improve the ObjectAccessHook machinery as we were working on that.
> I mean, Dimitri is not the first or last person to want to get control during
> DDL operations, and KaiGai's already done a lot of work figuring out
> how to make it work reasonably.
> Pre-create hooks don't exist in that machinery not because nobody wants
> them, but because it's hard.
I am not sure if the machinery is a very good fit for pre-create hooks. Thats
no argument against the machinery but against reusing it for this usecase.

> This whole problem is hard. It would be wrong to paint it as a problem that
> is unsolvable or not valuable, but it would be equally wrong to expect that
> it's easy or that anyone's first attempt (mine, yours, Dimitri's, KaiGai's,
> or Tom Lane's) is going to fall painlessly into place without anyone needing
> to sweat a little blood.
If I appeared a bit grumpy about your review its mostly because I would like
to get the feature to into 9.3 because it would have saved me many hours of
pain in the past, and the review pushed it further away from that.
The review also didn't come in that early - but thats definitely not your
fault, Tom and you did a lot of work on many patches and unfortunately for us
you can't scale limiteless. You shouldn't need to, but reality seems to tell
us otherwise :(

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-27 16:17:32 Re: Command Triggers patch v18
Previous Message Alvaro Herrera 2012-03-27 15:30:39 Re: Reporting WAL file containing checkpoint's REDO record in pg_controldata's result