Re: Event Triggers: adding information

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-29 12:48:36
Message-ID: CA+TgmobrEg1jWH6K0tHpb0O1=NvTV-+g9g=MCeLF03w5M39x9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 25, 2012 at 10:34 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Thank you for this partial commit, and Simon and Andres to fill in the
> gaps. I should mention that the missing header parts were all in my
> patch, and that headers hacking is proving suprisingly uneasy.

Apologies to all about the missed bits. I believe that I had them in
my version of the patch as well, but I think the changes ended up
unstaged in my repository rather than folded into the patch. I'm
usually smarter than that, but, obviously, not this time.

>> I was thinking that we might need to go beyond what pg_regress can do
>> in this case. For example, one idea would be to install an audit
>> trigger that records all the DDL that happens during the regression
>> tests. Then, afterwards, replay that DDL into a new database. Then
>> do a schema-only dump of the old and new databases and diff the dump
>> files. That's a little wacky by current community standards but FWIW
>> EDB has a bunch of internal tests that we run to check our proprietary
>> stuff; some of them work along these lines and it's pretty effective
>> at shaking out bugs.
>
> Are you in a position to contribute those parts to the community?
> Implementing them again then having to support two different variants of
> them does not look like the best use of both our times.

If I thought there were some useful code, I would try to see if we
could contribute it, but I'm pretty sure there isn't. We have a bunch
of pg_dump tests, but their intended purpose is to verify that our
proprietary stuff dumps and reloads properly, which is not what we
need here. The reason I mentioned it was just to make the point that
it is possible to have a large suite of tests that goes beyond what is
possible with pg_regress, and that this can be a useful way of finding
out whether you've broken things. In our case, we actually cheat
pretty hard by using things like \! pg_dump in the .sql files, so it
actually ends up going through pg_regress after all, technically
anyway. I'm not quite sure what to think about that approach from a
community point of view. It certainly ends up reducing the number of
new things that you need to invent in order to do new kinds of
testing, but it's still not as flexible as I would like - for example,
I don't see a way to do the exact kind of testing we need here in a
cross-platform way using that infrastructure.

I'm tempted to propose that we define a Perl-based testing API for
"additional regression tests" that can't be handled using pg_regress.
Like: we put a bunch of Perl modules in a certain directory, and then
write a script that goes through and do

require $module;
$module->regress;

...on each one, based on some schedule file. If a test passes or
fails, it calls some API that we provide to report the result. That
way, whatever tests we write for this effort can potentially become
the core of a larger test suite, and by basing it on Perl we add no
new tool dependencies and can (hopefully) run on both Linux and
Windows. I'm open up to other ideas, of course.

> The case where we should certainly prefer looking at the catalog caches
> are when we want to know the actual schema where the object has been
> created. The current patch is deriving that information mostly from the
> parse tree, using the first entry of the search_path when the schema is
> not given in the command. It's ok because we are still in the same
> transaction and no command has been able to run in between the user
> command and our lookup, but I could easily get convinced to look up the
> catalogs instead, even more so once we have the OID of the object easily
> available in all places.

I haven't looked at the code, but it seems to me that there have got
to be edge cases where that will fail. For one thing, we only do
pretty minimal locking on namespaces, so I'm not at all sure that
someone couldn't (for example) rename the old namespace out of the
way, thus causing one search_path lookup to find the first schema in
the search_path and the other lookup to find the second schema in the
search_path. Frankly, I think we have some hazards of that type in
the DDL code as it stands, so it's not like nobody's ever made that
mistake before, but it would be nice not to propagate it.

Really, the only safe way to do these things is to have any given name
lookup done in one and only one place. We're some distance from there
and there and I don't know that this patch should be required to carry
the burden of nailing down all the loose ends in that area, but I
think it's reasonable to insist that it shouldn't do anything which
makes it impossible for someone else to nail down those loose ends,
which is still fairly high on my personal to-do list.

The other danger here is - what exactly do you mean by "no command has
been able to run between the user command and our lookup"? Because
you can do stupid things with functions like set_config(). This would
only be safe if no user-provided expressions can possibly get
evaluated between point A and point B, and that strikes me as the sort
of thing that could easily be false unless this is all done VERY close
to the start of processing.

A belated Merry Christmas to you as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2012-12-29 13:02:46 Re: pg_stat_statements: calls under-estimation propagation
Previous Message Andres Freund 2012-12-29 12:43:11 Re: inconsistent time zone formats in log