Re: Event Triggers: adding information

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 04:17:27
Message-ID: CA+TgmoYimdGCiKFbKFbcZ_pJ5jT+sryxN40p24i8efq8j6BNgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> - Context exposing and filtering
>
> My preference for that point is to include it and limit it to event
> triggers written in C, leaving the door open to add new events in
> the future, and allowing existing consumers of that framework see
> through implementation details should they need it.
>
> It's better than just a ProcessUtility_hook because you can see it
> in \dy, decide to disable it with ALTER EVENT TRIGGER (e.g. in a
> transcation to avoid recursive calling into itself), and it won't
> run in single user mode even when in shared_preload_libraries.
>
> Another idea is to protect it with a GUC.
>
> Yet another idea is to add a new property in the CREATE EVENT
> TRIGGER command so that user can request explicitely to have that
> and know the feature will break at next release and all the ones
> after that:
>
> CREATE EVENT TRIGGER foo ON ddl_command_start
> WHEN show_internals in ('Yes I know what I''m doing',
> 'Thanks')
> EXECUTE PROCEDURE …();
>
> That particular phrasing and option choice might be revisited before
> commit, though, I'm not wedded to it.

I'm not feeling very sanguine about any of this. I feel strongly that
we should try to do something that's more principled than just
throwing random junk into ProcessUtility.

> - Additional Information to PL/pgSQL
>
> We're talking about :
>
> - operation (CREATE, ALTER, DROP)
> - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …)
> - object OID (when there's a single object target)
> - object shema name (when there's a single object target)
> - object name (when there's a single object target)

In ddl_command_start, I believe you can only expose the information
that is available in the parse tree. That is, if the user typed
CREATE TABLE foo, you can tell the event trigger that the user didn't
specify a schema and that they mentioned the name foo. You cannot
predict what schema foo will actually end up in at that point, and we
shouldn't try. I don't have a problem with exposing the information
we have at this point with the documented caveat that if you rely on
it to mean something it doesn't you get to keep both pieces.

But I wonder: wouldn't it be better to just expose the raw string the
user typed? I mean, in all the cases we really care about, the
information we can *reliably* expose at this point is going to be
pretty nearly the same as extracting the third space-delimited word
out of the command text and splitting it on a period. And you can do
that much easily even in PL/pgsql. You could also extract a whole lot
of OTHER potentially useful information from the command text that you
couldn't get if we just exposed the given schema name and object name.
For example, this would allow you to write an event trigger that lets
you enforce a policy that all column names must use Hungarian
notation. Note that this is a terrible idea and you shouldn't do it,
but you could if you wanted to. And, the trigger would be relatively
long and complex and you might have bugs, but that's still better than
the status quo where you are simply out of luck.

Now, in a ddl_command_end trigger, there's a lot more information that
you can usefully expose. In theory, if the command records "what it
did" somewhere, you can extract that information with as much
precision as it cared to record. However, I'm pretty skeptical of the
idea of exposing a single OID. I mean, if the "main" use case here is
replication, there are holes big enough to drive a truck through
there. It's not really better for other use cases, either: if you're
auditing or logging or whatever, you can drive the same truck through
the same hole. It seems to me that to get anything really useful
here, you need some more complex way of passing data, like oh say an
array of OIDs. Really, you're going to need something quite a bit
more complicated than that to describe something like ALTER TABLE, but
even for pretty simple cases a single OID doesn't cut it.

> - Additional Information to PL/*
>
> It's separated away because you did opt-out from reviewing that
> part, so I guess we will continue to make that a separate patch, and
> it will still need to happen. I've been said that I shouldn't be on
> the hook to provide for that, and I wonder, but anyway I did already
> implement it in a previous version so I can do it again as soon as
> we know what we expose.

I'm not sure what you mean that you shouldn't be on the hook to
provide that. I don't think that you're obliged to submit patches to
make this work for other PLs - indeed, I think it's too late for that
for 9.3 - but I don't think anyone else is required to do it, either.
It doesn't seem like it would be too hard to get the other PLs up to
the same level that PL/pgsql is at; I just didn't want to try to
familiarize myself with PL/perl, PL/python, and PL/tcl, none of which
I know the guts of and most of which I don't compile regularly, at the
same time I was trying to grok everything else in the patch.

> - Command String Normalisation
>
> I removed it already from the main patch because we needed to talk
> about it more, and it's kind of a big part in itself. Meanwhile it
> seems to me that an agreement has been reached and that I will be
> able to follow the consensus and resume working on that part.

I'm not sure we've reached a consensus here (what is it?), but let me
run an idea or two up the flagpole and see if anyone salutes. It
seems to me that there's some consensus that reverse-parsing may be a
reasonable thing to include in some form, but there are a lot of
unsolved problems there, including (1) that the parse tree might not
contain enough information for the reverse-parser to describe what the
command actually did, as opposed to what the user asked for, and (2)
there may be important decisions which were made at execution time and
which are needed for correct replication but which actually can't be
represented in the syntax at all (I am not sure what these would be:
but Tom seemed to think that such cases exist, so they need to be
understood and thought about).

If we're going to do reverse parsing, I would argue that we need to
put some effort into making that more flexible than the designs I've
seen so far. For example, I believe your concept of command string
normalization includes the idea that we'd fully schema-qualify the
object name. Like... instead of saying CREATE TABLE foo (a text),
we'd say CREATE TABLE public.foo (a text). That would guarantee that
when we replay the command on another machine, it ends up in the same
schema. But who is to say that's what the user wants? You can
imagine, for example, that a user might want to replicate schema
public on node A into schema node_a on node B. If we're back to
parsing the SQL at that point we are still ahead - but not very far
ahead. And, of course, the proper canonicalization is really CREATE
TABLE public.foo (a pg_catalog.text), because there could be another
type called text in some other schema on node B. That may seem like a
borderline case when the name of the type is "text" but if you
substitute "bar" for "text" it suddenly starts to seem totally
plausible that there could be more than one bar. Of course if you
substitute "hstore" for "text" but hstore is in a different schema in
the source and target databases then you've got a problem if you DO
schema-qualify things; and if the goal is just to pretty-print things
for the user's benefit, all of this may be overkill.

I recognize that it's probably impossible or unrealistic to get an
exactly-perfect solution to all of these problems, but I'm working my
way around to a proposal. Suppose we added a type pg_parse_tree, and
a bunch of accessor functions that operate on it. You could have
things like get_operation(pg_parse_tree) and
get_object_type(pg_parse_tree) and
get_array_of_object_names_and_schemas(pg_parse_tree), which would
largely obviate the need to pass 27 magic variables for all of the
things somebody might want - anything we decide we want to expose just
becomes another accessor function. Deparsing is also a function,
which means we don't incur the overhead of doing it unless the user
asks for it. And you might even be able to have mutators, so that you
can for example take a pg_parse_tree, make a new one by changing the
schema name that's mentioned, and then deparse that. I'm sure Tom is
scared now, if he's reading this, but maybe it's got some potential.

I certainly don't think this solves every problem. In particular, we
run the risk of exposing every internal detail of the parse tree,
something Tom and I have both vigorously resisted in every discussion
of event triggers since time immemorial. On the other hand, the
discussion of this topic suggests that the main thing that people want
out of event triggers is precisely to be able to get at information
from the parse tree, so we're kind of damned if we do and damned if we
don't. At least something like this provides a reasonable abstraction
boundary that allows the information to be exposed in a controlled
way. For example, if we add a function that tells you what object
name the user typed, it doesn't need to expose the fact that the parse
tree names that field differently in different DDL constructs, which
certainly increases our odds of not going insane after a release or
two of maintaining this code.

The other question is what to do about the information that's gathered
as the command is running that you afterwards wish to provide to the
event trigger - for example, for CREATE TABLE foo (), you want to
reveal the actually-selected creation schema. One idea is to have DDL
commands annotate the parse tree with these sorts of details, but the
disadvantage of this is that it might not be what everybody wants for
all purposes; and you still have the question of what to do with
information that can't be effectively represented in the parse tree at
all. I'm inclined to think a better approach might be to pass around
some kind of "container" data structure to which the DDL command can
glue in whatever bits of information it wants to include, and then
dump the result out as a 2-D text array or a JSON object or something.
I'm waving my hands at this problem a bit because of course one
important use case is to be able to do a reverse-parse that factors
this information into account, but we could do that, I think, by
allowing this information to be passed to the deparse API along with
the pg_parse_tree object. I'm waving my hands here, but maybe not any
harder than any other design thus far proposed, and a fuller
explanation would make this interminably long email even longer, so
I'll stop here for now.

> - ddl_event_trace
>
> Now is the time to kill it (for 9.3) or allow it in.
>
> It means another cache lookup at DDL Event time (so 2 cache lookups
> per DDL in 9.3); and allow users who just want to have the
> information not to have to care about when it is available.

Unless we get some of the other stuff this doesn't have much value
anyway, and depending on how other parts of the design shake out this
might seem less relevant by the time we get done.

> - Non DDL related Events
>
> Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best,
> unless someone else wants to have at it, maybe?

I'd like to take a crack at adding support for some kind of really
cool event for 9.4 - I suspect this one is not what I'd pick as a
first target, just because it seems hard. But I'll mull it over.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-01-23 04:25:19 Re: Prepared statements fail after schema changes with surprising error
Previous Message Tom Lane 2013-01-23 04:02:18 Re: Patch: clean up addRangeTableEntryForFunction