Re: Event Triggers: adding information

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-27 04:11:42
Message-ID: CA+TgmoZ4U6Bwe-+HhUFWaj2fhG5S=Wo5zK5KR0x7RyU9qphKAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> My understanding is that if the command string we give to event triggers
> is ambiguous (sub-object names, schema qualifications, etc), it comes
> useless for logical replication use. I'll leave it to the consumers of
> that to speak up now.

Yeah, that's probably true. I think it might be useful for other
purposes, but I think we need a bunch of infrastructure we don't have
yet to make logical replication of DDL a reality.

> [… no access to tuples in per-statement triggers …]
>> I agree that was a reasonable trade-off. But it would be quite
>> something else again if someone were to propose the idea of having NEW
>> and OLD available for each-statement triggers, but only give the
>> information about the first row affected by the statement, rather than
>> all of them. I think that proposal would quite rightly be rejected,
>> and I think it's morally equivalent to what you're proposing, or what
>> I understood you to be proposing, at any rate.
>
> No it's not.
>
> + /*
> + * we only support filling-in information for DROP command if we only
> + * drop a single object at a time, in all other cases the ObjectID,
> + * Name and Schema will remain NULL.
> + */
> + if (list_length(stmt->objects) != 1)
>
> The current patch implementation is to fill in the object id, name and
> schema with NULL when we have something else than a single object as the
> target. I did that when I realized we have a precedent with statement
> triggers and that we would maybe share the implementation of the "record
> set variable" facility for PLs here.

But I don't see how *that's* going to be any good for logical
replication either. If the raw query string isn't useful, getting the
OID in some cases but not all surely can't be any better.

>> I'm not sure, either, but I think that exposing things as tables is a
>> neat idea. I had imagined passing either JSON or arrays, but tables
>> would be easier to work with, at least for PL/pgsql triggers.
>
> Any idea about a practical implementation that we can do in the 9.3
> timeframe? Baring that my proposal is to allow ourselves not to provide
> the information at all in that case in 9.3, and complete the feature by
> 9.4 time frame.

No, I don't have a brilliant idea. I think that we've probably about
come to the end of what we can reasonably do for 9.3.

> Possibly with OLD and NEW relations for per-statement triggers, if it
> turns out as I think that we can re-use the same piece of PLpgSQL side
> framework in both cases.

That would be a GREAT thing to have.

>>> The only commands that will target more than one object are:
>>>
>>> - DROP, either in the command or by means of CASCADE;
>>> - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage;
>>
>> CREATE TABLE can also create subsidiary objects (indexes,
>> constraints); and there could be more things that do this in the
>> future.
>
> Subsidiary objects are not the same thing at all as a command that
> targets more than one object, and the difference is user visible.

Hmm. Really?

I guess I'm not seeing why those cases are particularly different.

>>> The CASCADE case is something else entirely, and we need to be very
>>> careful about its locking behavior. Also, in the CASCADE case I can
>>> perfectly agree that we're not talking about a ddl_something event any
>>> more, and that in 9.4 we will be able to provide a sql_drop generic
>>> event that will now about that.
>>
>> I've felt from the beginning that we really need that to be able to do
>> anything worthwhile with this feature, and I said so last year around
>> this time. Meanwhile, absent that, if we put something in here that
>> only handles a subset of cases, the result will be DDL replication
>> that randomly breaks.
>
> So we have two proposals here:
>
> - Have the cascading drop calls back to process utility with a new
> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein
> you only stuff the ObjectAdress, and provide event triggers support
> for this new cascade command;
>
> - Implement a new event called "sql_drop" where you provide the same
> amount of information than in a ddl_command event (same lookups,
> etc), but without any parsetree nor I suppose the command string
> that the user would have to type to drop just that object.
>
> You objected to the first on modularity violation grounds, and on the
> second on development cycle timing grounds. And now you're saying that
> because we don't have a practical solution, I'm not sure, apparently
> it's dead, but what is?
>
> Please help me decipher your process of thoughs and conclusions.

OK, so I object to the first one on modularity grounds (as did Tom).
I don't object to the second one at all except that, AFAIK, nobody's
written the code yet. Maybe I'm misunderstanding something.

Actually, I think we could probably use a similar set of
infrastructure *either* to implement sql_drop OR to provide the
information to ddl_command_end. What I'm thinking about is that we
might have something sort of like after-trigger queue that we use for
ordinary triggers. Instead of calling a trigger as the drops happen,
we queue up a bunch of events that say "xyz got dropped". And then,
we can either fire something like ddl_command_end (and pass in the
whole list as an array or a table) or we can call something like
sql_drop (n times, passing details one one object per invocation). In
either approach, the triggers fire *after* the drops rather than in
the middle of them - it's just a question of whether you want one call
for all the drops or one call for each drop.

>> I'm not sure I can, but I think that the design I'm proposing gives a
>> lot of flexibility. If we create a pg_parse_tree type with no input
>> function and an output function that merely returns a dummy value, and
>> we pass a pg_parse_tree object to PL/pgsql event triggers, then it's a
>> small win even if it offers no accessors at all - because somebody
>> could put whatever accessor functions they want in a contrib module
>> without waiting for 9.4.
>
> That's not solving the problem we want to solve, though. You can already
> do what you say with the code that we have today, you just have to code
> the extension in C.

I don't agree, but I recognize there might be more than one valid opinion here.

>> The point is - we wouldn't expose the whole parse tree. Rather, we'd
>> expose a parse-tree as an opaque object, and give the user functions
>> that would expose specific bits of it. Those bits could grow over
>> time without needing any changes to PLs or existing trigger functions,
>> and they wouldn't change if the parse tree internals changed, and
>> users could extend the amount of information exposed via contrib
>> functions (at their own risk, of course).
>
> We already have that, baring the extension that grovels through the
> parse tree. The benefit from maintaining that code as an extension is
> relying on a stable enough API from the backend so that you can provide
> the same behaviour to users accross major versions. And with pg_upgrade,
> the same storage format.
>
> Here I don't see how to reach that goals without anything new in core,
> given that the main requirement from core about the parse tree is to be
> able to whack it around in any release (even minor) without concern's
> about API nor ABI stability.

Well, the point is that if you have a function that maps a parse tree
onto an object name, any API or ABI changes can be reflected in an
updated definition for that function. So suppose I have the command
"CREATE TABLE public.foo (a int)". And we have a call
pg_target_object_namespace(), which will return "public" given the
parse tree for the foregoing command. And we have a call
pg_target_object_name(), which will return "foo". We can whack around
the underlying parse tree representation all we want and still not
break anything - because any imaginable parse tree representation will
allow the object name and object namespace to be extracted. Were that
not possible it could scarcely be called a parse tree any longer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-01-27 04:15:10 Re: allowing privileges on untrusted languages
Previous Message Peter Geoghegan 2013-01-27 03:08:54 Re: enhanced error fields