Re: [PATCH] Transaction traceability - txid_status(bigint)

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Subject: Re: [PATCH] Transaction traceability - txid_status(bigint)
Date: 2016-08-29 07:53:34
Message-ID: CAMsr+YFRpVzboAW+zOnia2_hSFv4aKFboxP91C2i8MbdTW0upw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 29 August 2016 at 11:45, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2016-08-29 11:25:39 +0800, Craig Ringer wrote:
>> ERROR: could not access status of transaction 778793573
>> DETAIL: could not open file "pg_clog/02E6": No such file or directory
>>
>> What I'd really like is to be able to ask transam.c to handle the
>> xid_in_recent_past logic, treating an attempt to read an xid from
>> beyond the clog truncation threshold as a soft error indicating
>> unknown xact state. But that involves delving into slru.c, and I
>> really, really don't want to touch that for what should be a simple
>> and pretty noninvasive utility function.
>
> Can't you "just" check this against ShmemVariableCache->oldestXid while
> holding appropriate locks?

Hm. Yeah, I should've thought of that. Thank you.

>> A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
>> except for two issues:
>
> It seems like a bad idea to PG_CATCH and not re-throw an error. That
> generally is quite error prone. At the very least locking and such gets
> a lot more complicated (as you noticed below).

Yeah, and as I remember from the "fun" of trying to write apply errors
to tables in BDR. It wasn't my first choice.

>> * TransactionIdGetStatus() releases the CLogControlLock taken by
>> SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
>> thrown from SlruReportIOError(). It seems appropriate for
>> SimpleLruReadPage() to release the LWLock before calling
>> SlruReportIOError(), so I've done that as a separate patch (now 0001).
>
> We normally prefer to handle this via the "bulk" releases in the error
> handlers. It's otherwise hard to write code that handles these
> situations reliably. It's different for spinlocks, but those normally
> protect far smaller regions of code.

Fair enough. It's not a complex path, but there are a _lot_ of
callers, and while I can't really imagine any of them relying on the
CLogControLock being held on error it's not something I was keen to
change. I thought complicating the clog with a soft-error interface
was worse and didn't come up with a better approach.

Said better approach attached in revised series. Thanks.

My only real complaint with doing this is that it's a bit more
conservative. But in practice clog truncation probably won't follow
that far behind oldestXmin so except in fairly contrived circumstances
it won't hurt. Apps that need guarantees about how old an xid they can
get status on can hold down xmin with a replication slot, a dummy
prepared xact, or whatever. If we find that becomes a common need that
should be made simpler then appropriate API to allow apps to hold down
clog truncation w/o blocking vacuuming can be added down the track.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Introduce-txid_status-bigint-to-get-status-of-an-xac.patch text/x-patch 16.0 KB
0002-Add-txid_convert_if_recent-to-get-the-32-bit-xid-fro.patch text/x-patch 12.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yury Zhuravlev 2016-08-29 08:03:11 Re: Why is a newly created index contains the invalid LSN?
Previous Message Michael Paquier 2016-08-29 07:51:09 Re: pg_dump with tables created in schemas created by extensions