Re: WIP fix proposal for bug #6123

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Florian Pflug" <fgp(at)phlo(dot)org>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP fix proposal for bug #6123
Date: 2011-08-02 15:03:15
Message-ID: 4E37CB63020000250003F98D@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Aug1, 2011, at 20:02 , Kevin Grittner wrote:

>> I consider the trigger behavior addressed by this patch to be a
>> bug, and serious enough to merit inclusion of a fix in the 9.1
>> release, even at this late stage.
>
> I'm sorry but I disagree, on multiple grounds.

Thanks for offering your thoughts!

> First, I'm not sure this is even a bug. To me, it seems that
> postgres currently resolves an inherently ambiguous situation in
> one possible way, while you expect it to pick another. It might be
> that the behaviour that you suggest is more in line with people's
> expectations (More on that below), but that still doesn't make the
> existing behaviour wrong.

That point would be a hard sell for me. To silently turn a BEFORE
trigger into an INSTEAD OF trigger invites data corruption, in the
sense that business rules that you thought were unconditionally
enforced in triggers can be broken with no obvious hint that it
happened.

> Secondly, even if one considers the current behaviour to be wrong,
> that still doesn't prove that your proposed behaviour is any
> better

Granted. I tried to phrase my post to not preclude other solutions.

> There might very well be situations in which the current behaviour
> is preferable over the behaviour you suggest, and maybe these
> situations turn out to be much more common than anyone anticipates
> right now.

My imagination is not up to the task of putting any plausible
situations forward where current behavior is a good thing. Of
course I realize that's not proof of anything, but it leaves me
skeptical.

> But if we apply the patch now, the chance of that being discovered
> before 9.1 is released is virtually zero. And after the release
> there's no going back, so we might end up changing the behaviour
> to the worse.

Perhaps throwing an error on the DELETE case as well as the UPDATE
case would address this concern. That would have saved weeks of
staff time here when we started seeing the mysterious behavior which
currently exists. The only way we finally got to the bottom of it
was to step through execution in gdb; those users not equipped to do
so might have bigger problems than we did.

We would probably still be patching the version we use in our shop
to actually *do* the delete if no error is thrown and no BEFORE
trigger returns NULL, but we would have been able to get there a lot
faster.

> And lastly, I believe that your ultimate goal of guaranteeing that
> a row is actually deleted (updated) after a BEFORE DELETE (BEFORE
> UPDATE) trigger has fired is actually insurmountable anyway, at
> least for isolation level READ COMMITTED.

Hmm. That's a very potent point. I had not considered READ
COMMITTED because we have never, ever used it under PostgreSQL.
Obviously, any solution has to work for those who *do* use it, too;
and the behavior between different isolation levels can't be too
convoluted. So, as you say, the *ultimate* goal may be
unattainable.

> Having said all that, I don't doubt that the current behaviour is
> source of pain for you,

It has thrown off our production development and deployment schedule
by weeks, and management has considered shelving the whole concept
of using PostgreSQL triggers because of it. I've convinced them
that this is a surmountable obstacle, and am trying to get things
back on track.

> and I do believe that we ought to improve things - but not by
> replacing one unsatisfactory behaviour with another.

I hope you're not suggesting *that's* my goal! ;-)

> The root of the issue seems to be that you're trying to do things
> in a BEFORE trigger that really belong into an AFTER trigger.

I'll take that as a *collective* pronoun, since I haven't personally
written any of the thousands of triggers. There are 20 programmers
doing that. Part of the problem is that they sometimes put things
in a BEFORE trigger that belong in an AFTER trigger. I caught one
place they were doing an UPDATE of the target row in a BEFORE
trigger rather than setting the values in the NEW record. My patch
makes the latter throw an error, as I believe it should, rather than
silently leaving the data in a bad state.

> If you could explain why using an AFTER trigger doesn't work for
> you, maybe we could improve them to be suitable for your purposes
> (and please forgive me if you already did that).

I did to some extent, but really perhaps the biggest issue (which I
don't think I've really covered earlier in the thread) is to not
silently corrupt the data. I would settle for throwing an error on
the DELETE case as well as the UPDATE case, because my data would
then be protected, and programmers would be forced to work around
the issue in a way that PostgreSQL can handle correctly.

Robert said that he has mainly run into this on BEFORE UPDATE
triggers, so perhaps he can elaborate on the usage patterns that run
into it there.

In my case it is mainly an issue in deletes (and the special case of
a "purge" process where deletes are not normally allowed) doing a
depth-first search for dependent records, and deleting from the
bottom up. In some cases there is redundant information in higher
level tables based on primary data in lower level tables -- in the
form of counts, sums, or status codes. If you delete a case, and
that "cascades" to a event history table which has a trigger which
updates case status according to certain business rules, the delete
of the case is canceled because of the delete of a status-changing
event. That's very bad. If we at least threw an error instead, we
could identify the problem reliably, and code around it somehow.
That might be by moving the delete of dependent records to a point
after the parent record has already been deleted, but that runs the
risk of triggering other validation failures based on business rules
in the triggers, because the triggers would then be seeing a state
we try very hard to prevent.

>> To recap: a trigger for UPDATE BEFORE EACH ROW or DELETE BEFORE
>> EACH ROW, which directly or indirectly causes update of the OLD
>> row in the trigger, can cause the triggering operation to be
>> silently ignored even though the trigger returns a non-NULL
>> record, and even though all triggered modifications are
>> persisted. The only clue that an incomplete and incongruous set
>> of operations has been performed is that the UPDATE or DELETE
>> count from the statement is reduced by the number of rows on
>> which this occurred: if an UPDATE would have affected 5 rows, but
>> one of them just fired triggers *as though* it had been updated
>> but was actually left untouched by the original UPDATE statement,
>> you would get "UPDATE 4" rather than "UPDATE 5".
>>
>> This patch causes DELETE to behave as most people would expect,
>> and throws and ERROR on the conflicting UPDATE case.
>
> I'm prepared to argue that whether or not people expect the
> behaviour that patch implements depends entirely on how you phrase
> the question. If you ask "do you expect a row to be actually
> deleted after BEFORE DELETE triggers have run", you'll undoubtedly
> hear "yes". But if you ask "do you expect a row to be deleted even
> though it didn't match the DELETE's WHERE condition" I bet you'll
> hear a very firm "no".

OK. I guess I've neglected that possibility based on my transaction
isolation bias and the fact that the types of changes being made in
the cascade of deletes wouldn't tend to create such a situation.
But stepping back, I see your point there.

> And equally firm do I expect to be the "no" if you ask "do you
> expect the row that is actually deleted and the contents of the
> variable OLD in the delete trigger to differ".

I'm less certain about that, especially if you phrase it in terms of
denormalization optimizations -- the redundant data in the form of
counts, sums, status codes, or sometimes just copied data from child
records stored in parents to support better performance on common
queries. That's the main issue for me,

> But the behaviour indicated in the second and third question *is*
> what happens with your patch applied. Here's an example
>
> create table t (id int);
> create or replace function on_t_delete() returns trigger as $$
> begin
> raise notice 'deleting row %', old.id;
> update t set id = -id where id = old.id;
> return old;
> end;
> $$ language plpgsql volatile;
> create trigger t_delete
> before delete on t
> for each row execute procedure on_t_delete();
>
> insert into t (id) values (1);
> insert into t (id) values (2);
>
> delete from t where id > 0;
>
> Without your patch, the DELETE doesn't remove any rows, because
> they're updated in on_t_delete(). With your patch applied, the
> rows are removed - even though, due to the UPDATE in
> on_t_delete(), they *don't* match the DELETE's WHERE condition (id
> > 0) at the time they're deleted.

That's not the sort of situation we're seeing here, but I get your
point.

Would you feel comfortable with a patch which threw an error on the
DELETE case, as it does on the UPDATE case?

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-08-02 15:19:05 9.1 release schedule
Previous Message Tom Lane 2011-08-02 14:52:44 Re: WAL logging volume and CREATE TABLE