Re: Command Triggers patch v18

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject: Re: Command Triggers patch v18
Date: 2012-03-27 20:05:08
Message-ID: 201203272205.08556.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tuesday, March 27, 2012 07:34:46 PM Robert Haas wrote:
> On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > I still think the likeliest way towards that is defining some behaviour
> > we want regarding
> > * naming conflict handling
> > * locking
> >
> > And then religiously make sure the patch adheres to that. That might need
> > some refactoring of existing code (like putting a art of the utility.c
> > handling of create table into its own function and such), but should be
> > doable.
> >
> > I think BEFORE command triggers ideally should run
> > * before permission checks
> > * before locking
> > * before internal checks are done (nameing conflicts, type checks and
> > such)
> It is possible to do this, and it would actually be much easier and
> less invasive to implement than what Dimitri has done here, but the
> downside is that you won't have done the name lookup either. See my
> last email to Dimitri for a long explanation of why that restriction
> is not easily circumventable: name lookups, locking, and permission
> checks are necessarily and inextricably intertwined.
Read your other mail. Comming back to it I think the above might be to
restrictive to be usefull for a big part of use-cases. So your idea of more
hook points below has some merits.

> Still, if others
> agree this is useful, I think it would be a lot easier to get right
> than what we have now.
I think its pretty important to have something thats usable rather easily
which requires names to be resolved and thus permission checks already
performed and (some) locks already acquired.
I think quite some of the possible usages need the facility to be as simple as
possible and won't care about already acquired locks/names.

> Some of what we're now doing as part of parse analysis should really
> be reclassified. For example, the ProcessUtility branch that handles
> T_CreateStmt and T_CreateForeignTableStmt should really be split out
> as a separate function that lives in tablecmds.c, and I think at least
> some of what's in transformCreateStmt should be moved to tablecmds.c
> as well. The current arrangement makes it really hard to guarantee
> things like "the name gets looked up just once", which seems obviously
> desirable, since strange things will surely happen if the whole
> command doesn't agree on which OID it's operating on.
Yes, I aggree, most of that should go from utility.c.

> > Looking up oids and such before calling the trigger doesn't seem to be
> > problematic from my pov. Command triggers are a superuser only facility,
> > additional information they gain are no problem.
> > Thinking about that I think we should enforce command trigger functions
> > to be security definer functions.
> I don't see any benefit from that restriction.
The reason I was thinking it might be a good idea is that they get might get
more knowledge passed than the user could get directly otherwise. Especially
if we extend the scheme to more places/informations.

> >> I actually think that, to really meet all needs here, we may need the
> >> ability to get control in more than one place.
> > Not sure what you mean by that. Before/after permission checks,
> > before/after consistency checks? That sort of thing?
> Yes. For example, above you proposed a kind of trigger that fires
> really early - basically after parsing but before everything else.
> What Dimitri has implemented is a much later trigger that is still
> before the meat of the command, but not before *everything*. And then
> there's an AFTER trigger, which could fire either after an individual
> piece or stage of the command, or after the whole command is complete,
> either of which seems potentially useful depending on the
> circumstances. I almost think that the BEFORE/AFTER name serves to
> confuse rather than to clarify, in this case. It's really a series of
> specific hook points: whenever we parse a command (but before we
> execute it), after security and sanity checks but before we begin
> executing the command, before or after various subcommands, after the
> whole command is done, and maybe a few others. When we say BEFORE or
> AFTER, we implicitly assume that we want at most two of the things
> from that list, and I am not at all sure that's what going to be
> enough.
You might have a point there. Not sure if the complexity of explaining all the
different hook points is worth the pain.

What are the phases we have:

* after parse
* logging
* after resolving name
* after checking permisssions (interwined with the former)
* override permission check? INSTEAD?
* after locking (interwined with the former)
* except it needs to be connected to resolving the name/permission check
this doesn't seem to be an attractive hook point
* after validation
* additional validation likely want to hook here
* after execution
* replication might want to hook here

Am I missing an interesting phase?

Allowing all that probably seems to require a substantial refactoring of
commands/ and catalog/

> One thing I had thought about doing, in the context of sepgsql, and we
> may yet do it, is create a hook that gets invoked whenever we need to
> decide whether it's OK to perform an action on an object. For
> example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook
> both "is it OK to add a foreign key to table X?" and also "is it OK to
> make a foreign key refer to table Y"? This doesn't fit into the
> command-trigger framework at all, but it's definitely useful for
> sepgsql, and I bet it's good for other things, too - maybe not that
> specific example, but that kind of thing.
Its a neat feature, yes. Seems to be orthogonal yes.

> >> I agree that it's not a very helpful decision, but I'm not the one who
> >> said we wanted command triggers rather than event triggers. :-)
> >
> > Well, but on the other hand what will you do with a "pg_attribute row
> > added" event. It doesn't even have a oid ;)
>
> We refer to everything using ObjectAddress notation - the OID of the
> system catalog that contains objects of that type, the OID of the
> object itself, and a "sub-object ID" which is 0 in general but the
> column number in that specific case. What you can do is go fetch the
> pg_attribute row and then do whatever you like - log it, throw an
> error, etc. It's a hook that gets invoked when you add a column. If
> it's important to differentiate between a column that gets added at
> CREATE TABLE time and one that gets added by ALTER TABLE .. ADD
> COLUMN, that could be done. It's a lot easier to deal with triggers
> that fire in cases you don't care about than to deal with triggers
> that don't fire in cases you do care about. The former you can just
> ignore.
I don't believe that distinction can be done that easily without more impact
than the CT patch.
I think you need a surprisingly high amount of context to know when to ignore
a trigger. Especially as its not exactly easy to transfer knowledge from one
to the next.

> > But how would do something like logging the originating statement with
> > the object access hook machinery?
> You can't, but you can do a lot of other useful stuff that command
> triggers as currently envisioned won't allow. I'm starting to think
> that maybe BEFORE triggers need to fire on commands and AFTER triggers
> on events? Can we define "start of command execution" as an event? I
> don't think we need two completely different facilities, but I really
> want to make sure we have a plan for capturing events. I can't
> imagine how replication is gonna work if we can't log exactly what we
> did, rather than just the raw query string. Statement-based
> replication is sometimes useful, especially in heterogenous
> environments, but it's always kinda sucky, in that there are edge
> cases that break.
I don't think creating *new* DDL from the parsetree can really count as
statement based replication. And again, I don't think implementing that will
cost that much effort.
How would it help us to know - as individual events! - which tuples where
created where? Reassembling that will be a huge mess. I basically fail to see
*any* use case besides access checking.

> If you want to know what schema the server is going
> to use for a newly created object before it creates it, you've got to
> do that calculation yourself, and now you've created a risk of getting
> a different answer. If you want to capture all the drops and replay
> them on the slave, you have to be able to handle CASCADE.
I would like to have CASCADE handled, yes.

> > Well, the problem is with just catalog access its just about impossible
> > to generate differential changes from them. Also the catalog between
> > clusters *wont be the same* in a large part of the use-cases. Oids will
> > basically never match.
> True. ALTER commands are going to need additional information. But
> some of that information also won't be available until quite late.
> For example, an ALTER on a parent might cascade down to children, and
> a command trigger might want to know exactly which tables were
> affected. That trigger isn't really BEFORE or AFTER or INSTEAD OF;
> it's more like DURING, and at a very specific point DURING.
Well, *I* would like a separate commmand trigger to be fired for sub-actions
but see them qualified as such...

> > Matching the intent, which is mostly visible in the parsetree, seems to
> > be a whole much more helpfull.
> But note that there are a lot of possible things you can't tell from
> the parse-tree without deducing them. Sure, for LISTEN, the parse
> tree is fine. But for ALTER TABLE or DROP the parse tree doesn't even
> tell you what object you're operating on (at least, not unless the
> user includes an explicit schema-qualification), let alone the whole
> list of objects the command is ultimately going to process due to
> cascading, inheritance, etc. You can reimplement that logic in your
> trigger but that's both fragile and laborious.
Resolving objects to the schema qualified variant isn't that much of a problem.
Inheritance is, I definitely can see that.

For lots of simple use-cases even something that has some restrictions
attached would be *WAY* better than the current situation where all that is
not possible at all.

I fear a bit that this discussion is leading to something thats never going to
materialize because it would require a huge amount of work to get there.

> >> That can be fixed using the optional argument to
> >> InvokeObjectAccessHook, similar to what we've done for differentiating
> >> internal drops from other drops.

> > Several of the points where those hooks are invoked don't even seem to
> > have the necessary information for that... Doing that seems to involve
> > passing heaps of additional state through the backend.

> I've been trying pretty hard to avoid that, but I agree that there are
> possible stumbling blocks there. I don't think it's an insurmountable
> problem, though.

> We just need to start with a clean definition of
> what we want the behavior to be; then, we may need to refactor
> somewhat to get there. The problem with the current code is that it's
> just accepting whatever fits nicely into the current code as a
> specification-in-fact, and I don't think that's a good basis for a
> feature on which people will want to build complex and robust systems.
I agree that we need consistency there.

> I find it rather regrettable that there were so few people doing review this
> CommitFest. I spent a lot of time on things that could have been done by
> other people, and the result is that some things like this have gone rather
> late.
I find that sad too. Even though I am part of the crowd that didn't do enough.
Thanks for the work!

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-27 20:17:18 pgsql: pg_test_timing utility, to measure clock monotonicity and timing
Previous Message Peter Geoghegan 2012-03-27 19:47:38 Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)