Re: Command Triggers patch v18

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject: Re: Command Triggers patch v18
Date: 2012-03-27 17:34:46
Message-ID: CA+TgmoaDFNP6q6RFb-WCw_XsiG+Cz_rQt-gxwsafen-n0BdNsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I still think the likeliest way towards that is defining some behaviour we want
> regarding
> * naming conflict handling
> * locking
>
> And then religiously make sure the patch adheres to that. That might need some
> refactoring of existing code (like putting a art of the utility.c handling of
> create table into its own function and such), but should be doable.
>
> I think BEFORE command triggers ideally should run
> * before permission checks
> * before locking
> * before internal checks are done (nameing conflicts, type checks and such)

It is possible to do this, and it would actually be much easier and
less invasive to implement than what Dimitri has done here, but the
downside is that you won't have done the name lookup either. See my
last email to Dimitri for a long explanation of why that restriction
is not easily circumventable: name lookups, locking, and permission
checks are necessarily and inextricably intertwined. Still, if others
agree this is useful, I think it would be a lot easier to get right
than what we have now.

> Obviously some things will be caught before that (parse analysis of some
> commands) and I think we won't be able to fully stop that, but its not totally
> consistent now and while doing some work in the path of this patch seems
> sensible it cannot do-over everything wrt this.

Some of what we're now doing as part of parse analysis should really
be reclassified. For example, the ProcessUtility branch that handles
T_CreateStmt and T_CreateForeignTableStmt should really be split out
as a separate function that lives in tablecmds.c, and I think at least
some of what's in transformCreateStmt should be moved to tablecmds.c
as well. The current arrangement makes it really hard to guarantee
things like "the name gets looked up just once", which seems obviously
desirable, since strange things will surely happen if the whole
command doesn't agree on which OID it's operating on.

> Looking up oids and such before calling the trigger doesn't seem to be
> problematic from my pov. Command triggers are a superuser only facility,
> additional information they gain are no problem.
> Thinking about that I think we should enforce command trigger functions to be
> security definer functions.

I don't see any benefit from that restriction.

>> I actually think that, to really meet all needs here, we may need the
>> ability to get control in more than one place.
> Not sure what you mean by that. Before/after permission checks, before/after
> consistency checks? That sort of thing?

Yes. For example, above you proposed a kind of trigger that fires
really early - basically after parsing but before everything else.
What Dimitri has implemented is a much later trigger that is still
before the meat of the command, but not before *everything*. And then
there's an AFTER trigger, which could fire either after an individual
piece or stage of the command, or after the whole command is complete,
either of which seems potentially useful depending on the
circumstances. I almost think that the BEFORE/AFTER name serves to
confuse rather than to clarify, in this case. It's really a series of
specific hook points: whenever we parse a command (but before we
execute it), after security and sanity checks but before we begin
executing the command, before or after various subcommands, after the
whole command is done, and maybe a few others. When we say BEFORE or
AFTER, we implicitly assume that we want at most two of the things
from that list, and I am not at all sure that's what going to be
enough.

One thing I had thought about doing, in the context of sepgsql, and we
may yet do it, is create a hook that gets invoked whenever we need to
decide whether it's OK to perform an action on an object. For
example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook
both "is it OK to add a foreign key to table X?" and also "is it OK to
make a foreign key refer to table Y"? This doesn't fit into the
command-trigger framework at all, but it's definitely useful for
sepgsql, and I bet it's good for other things, too - maybe not that
specific example, but that kind of thing.

>> I think that KaiGai only wants to be able to deny things that would
>> normally be allowed, but I suspect some of those other folks would
>> also like to be able to allow things that would normally be denied.
> Denying seems to be easier than allowing stuff safely....

Yes.

>> For those use cases, you want to get control at permissions-checking
>> time.  However, where Dimitri has the hooks right now, BEFORE triggers
>> for ALTER commands fire after you've already looked up the object that
>> you're manipulating.  That has the advantage of allowing you to use
>> the OID of the object, rather than its name, to make policy decisions;
>> but by that time it's too late to cancel a denial-of-access by the
>> first-line permissions checks.
> Why? Just throw a access denied exception? Unless its done after the locking
> that won't be visible by anything but timing?

I think you misread what I wrote. It's still possible to deny things,
but it's too late to allow something that an earlier check has already
denied. Right now, sepgsql actually does its denying very late, from
the post-create hook. That's not ideal, of course, but it works. I
know KaiGai would like it to happen earlier, to avoid wasting work.

> Additional granting is more complex though, I am definitely with you there.
> That will probably need INSTEAD triggers which I don't see for now. Those will
> have their own share of problems.

Right.

>> I agree that it's not a very helpful decision, but I'm not the one who
>> said we wanted command triggers rather than event triggers.  :-)
> Well, but on the other hand what will you do with a "pg_attribute row added"
> event. It doesn't even have a oid ;)

We refer to everything using ObjectAddress notation - the OID of the
system catalog that contains objects of that type, the OID of the
object itself, and a "sub-object ID" which is 0 in general but the
column number in that specific case. What you can do is go fetch the
pg_attribute row and then do whatever you like - log it, throw an
error, etc. It's a hook that gets invoked when you add a column. If
it's important to differentiate between a column that gets added at
CREATE TABLE time and one that gets added by ALTER TABLE .. ADD
COLUMN, that could be done. It's a lot easier to deal with triggers
that fire in cases you don't care about than to deal with triggers
that don't fire in cases you do care about. The former you can just
ignore.

>> > Which would basically require including pg_dump in the backend to
>> > implement replication and logging. I don't think librarifying pg_dump is
>> > a fair burden at all.
>> I don't either, but that strikes me as a largely unrelated problem.
>> As you may recall, I've complained that too much of that functionality
>> is in the client in the past, and I haven't changed my opinion on
>> that.
> I am not really happy with that as well. But if we start putting all that into
> this patch we will never get anywhere.

Just to be clear, I wasn't proposing that.

> I think moving that nearer to the
> server in some form is a MAJOR project on its own.
> Also I dont see it solving differential stuff at all.

True.

> But how would do something like logging the originating statement with the
> object access hook machinery?

You can't, but you can do a lot of other useful stuff that command
triggers as currently envisioned won't allow. I'm starting to think
that maybe BEFORE triggers need to fire on commands and AFTER triggers
on events? Can we define "start of command execution" as an event? I
don't think we need two completely different facilities, but I really
want to make sure we have a plan for capturing events. I can't
imagine how replication is gonna work if we can't log exactly what we
did, rather than just the raw query string. Statement-based
replication is sometimes useful, especially in heterogenous
environments, but it's always kinda sucky, in that there are edge
cases that break. If you want to know what schema the server is going
to use for a newly created object before it creates it, you've got to
do that calculation yourself, and now you've created a risk of getting
a different answer. If you want to capture all the drops and replay
them on the slave, you have to be able to handle CASCADE.

> Well, the problem is with just catalog access its just about impossible to
> generate differential changes from them. Also the catalog between clusters
> *wont be the same* in a large part of the use-cases. Oids will basically never
> match.

True. ALTER commands are going to need additional information. But
some of that information also won't be available until quite late.
For example, an ALTER on a parent might cascade down to children, and
a command trigger might want to know exactly which tables were
affected. That trigger isn't really BEFORE or AFTER or INSTEAD OF;
it's more like DURING, and at a very specific point DURING.

> Matching the intent, which is mostly visible in the parsetree, seems to be a
> whole much more helpfull.

But note that there are a lot of possible things you can't tell from
the parse-tree without deducing them. Sure, for LISTEN, the parse
tree is fine. But for ALTER TABLE or DROP the parse tree doesn't even
tell you what object you're operating on (at least, not unless the
user includes an explicit schema-qualification), let alone the whole
list of objects the command is ultimately going to process due to
cascading, inheritance, etc. You can reimplement that logic in your
trigger but that's both fragile and laborious.

>> > Also I have a *very hard* time to imagining to sensibly implement
>> > replicating CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object
>> > access hooks. There is just not enough context. How would you discern
>> > between a ADD COLUMN and a CREATE TABLE? They look very similar or even
>> > identical on a catalog level.
>> That can be fixed using the optional argument to
>> InvokeObjectAccessHook, similar to what we've done for differentiating
>> internal drops from other drops.
> Several of the points where those hooks are invoked don't even seem to have
> the necessary information for that... Doing that seems to involve passing
> heaps of additional state through the backend.

I've been trying pretty hard to avoid that, but I agree that there are
possible stumbling blocks there. I don't think it's an insurmountable
problem, though. We just need to start with a clean definition of
what we want the behavior to be; then, we may need to refactor
somewhat to get there. The problem with the current code is that it's
just accepting whatever fits nicely into the current code as a
specification-in-fact, and I don't think that's a good basis for a
feature on which people will want to build complex and robust systems.

> If I appeared a bit grumpy about your review its mostly because I would like
> to get the feature to into 9.3 because it would have saved me many hours of
> pain in the past, and the review pushed it further away from that.
> The review also didn't come in that early - but thats definitely not your
> fault, Tom and you did a lot of work on many patches and unfortunately for us
> you can't scale limiteless. You shouldn't need to, but reality seems to tell
> us otherwise :(

Yeah. I probably should have looked at this more sooner, but (1) I
was busy, (2) it seemed like a lot of active development was
happening, and (3) Thom was reviewing away. And I hate reviewing code
when there are new versions coming out every day, because obviously
there's no point in my doing final pre-commit hacking when that's
happening. I did do some big picture review that resulted in some
significant scope-trimming early in the CommitFest, but I wish now
that I'd dived into it a bit more. I find it rather regrettable that
there were so few people doing review this CommitFest. I spent a lot
of time on things that could have been done by other people, and the
result is that some things like this have gone rather late.

All that having been said, if I had my druthers, we would have bounced
this patch out of the CommitFest on day 1, because it has long been
and continues to be my belief that getting a major feature done in one
CommitFest is unrealistic, unless we're willing to have that
CommitFest take as long as 2 or 3 CommitFests, which I dislike. The
last CommitFest is hard enough without last-minute major feature
submissions; and you'll note that the latest any feature of mine has
ever been committed is late January, and not because I ignore everyone
else to work on my own stuff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-27 17:36:23 Re: 9.2 commitfest closure (was Command Triggers, v16)
Previous Message Daniel Farina 2012-03-27 17:26:42 Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)