Re: Some other odd buildfarm failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Some other odd buildfarm failures
Date: 2014-12-26 15:32:00
Message-ID: 31271.1419607920@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Tom Lane wrote:
>> Still, I don't think this is a reasonable test design. We have
>> absolutely no idea what behaviors are being triggered in the other
>> tests, except that they are unrelated to what those tests think they
>> are testing.

> I can of course move it to a separate parallel test, but I don't think
> that should be really necessary.

I've not proven this rigorously, but it seems obvious in hindsight:
what's happening is that when the object_address test drops everything
with DROP CASCADE, other processes are sometimes just starting to execute
the event trigger when the DROP commits. When they go to look up the
trigger function, they don't find it, leading to "cache lookup failed for
function". The fact that the complained-of OID is slightly variable, but
always in the range of OIDs that would be getting assigned around this
point in a "make check" run, buttresses the theory.

I thought about changing the object_address test so that it explicitly
drops the event trigger first. But that would not be a fix, it would
just make the timing harder to hit (ie, a victim process would need to
lose control for longer at the critical point).

Since I remain of the opinion that a test called object_address has no
damn business causing global side-effects, I think there are two
reasonable fixes:

1. Remove the event trigger. This would slightly reduce the test's
coverage.

2. Run that whole test as a single transaction, so that the event trigger
is created and dropped in one transaction and is never seen as valid by
any concurrent test.

A long-term idea is to try to fix things so that there's sufficient
locking to make dropping an event trigger and immediately dropping its
trigger function safe. But I'm not sure that's either possible or a good
idea (the lock obtained by DROP would bring the entire database to a
standstill ...).

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-12-26 15:41:23 Re: BUG #12330: ACID is broken for unique constraints
Previous Message Kevin Grittner 2014-12-26 15:23:37 Re: BUG #12330: ACID is broken for unique constraints