Re: pika buildfarm member failure on isolationCheck tests

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "R?mi Zara" <remi_zara(at)mac(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pika buildfarm member failure on isolationCheck tests
Date: 2011-06-20 14:40:44
Message-ID: 4DFF159C020000250003E95F@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:

> It looks the problem comes from the change a couple days ago that
> removed the SXACT_FLAG_ROLLED_BACK flag and changed the
> SxactIsRolledBack checks to SxactIsDoomed. That's the correct
> thing to do everywhere else, but gets us in trouble here. We
> shouldn't be ignoring doomed transactions in
> SetNewSxactGlobalXmin, because they're not eligible for cleanup
> until the associated backend aborts the transaction and calls
> ReleasePredicateLocks.
>
> However, it isn't as simple as just removing the SxactIsDoomed
> check from SetNewSxactGlobalXmin. ReleasePredicateLocks calls
> SetNewSxactGlobalXmin *before* it releases the aborted
> transaction's SerializableXact (it pretty much has to, because we
> must drop and reacquire SerializableXactHashLock in between). But
> we don't want the aborted transaction included in the
> SxactGlobalXmin computation.
>
> It seems like we do need that SXACT_FLAG_ROLLED_BACK after all,
> even though it's only set for this brief interval. We need to be
> able to distinguish a transaction that's just been marked for
> death (doomed) from one that's already called
> ReleasePredicateLocks.

I just looked this over and I agree. It looks to me like we can set
this flag in ReleasePredicateLocks (along with the doomed flag) and
test it only in SetNewSxactGlobalXmin (instead of the doomed flag).
Fundamentally, the problem is that while it's enough to know that a
transaction *will be* canceled in order to ignore it during
detection of rw-conflicts and dangerous structures, it's not enough
when calculating the new serializable xmin -- we need to know that a
transaction actually *has been* canceled to ignore it there.

Essentially, the fix is to revert a very small part of
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=264a6b127a918800d9f8bac80b5f4a8a8799d0f1

Either of us could easily fix this, but since you have the test
environment which can reproduce the problem, it's probably best that
you do it.

Since Heikki took the trouble to put the flags in a nice logical
order, we should probably put this right after the doomed flag.

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Weimer 2011-06-20 14:57:40 Re: POSIX question
Previous Message Cédric Villemain 2011-06-20 14:39:40 Re: Re: patch review : Add ability to constrain backend temporary file space