Re: Usage of epoch in txid_current

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Usage of epoch in txid_current
Date: 2018-07-09 23:35:59
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Fri, Dec 8, 2017 at 3:36 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Dec 6, 2017 at 11:26 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> Either way, it is not clear to me how we will keep it
>>> updated after recovery. Right now, the mechanism is quite simple, at
>>> the beginning of a recovery we take the value of nextXid from
>>> checkpoint record and then if any xlog record indicates xid that
>>> follows nextXid, we advance it. Here, the point to note is that we
>>> take the xid from the WAL record (which means that it assumes xids are
>>> non-contiguous or some xids are consumed without being logged) and
>>> increment it. Unless we plan to change something in that logic (like
>>> storing 64-bit xids in WAL records), it is not clear to me how to make
>>> it work. OTOH, recovering value of epoch which increments only at
>>> wraparound seems fairly straightforward as described in my previous
>>> email.
>> I think it should be fairly simple if simply add the 64bit xid to the
>> existing clog extension WAL records.
> IIUC, you mean to say that we should log the 64bit xid value in
> CLOG_ZEROPAGE record while extending clog and that too we can do only
> at wraparound. Now, maybe doing it every time also doesn't hurt, but
> I think doing it at wraparound should be sufficient.

Can you please elaborate on why clog redo in particular would need to
use 64 bit xids? Is the 64 bit xid not derivable from the 32 bit xid
in the WAL + the current value of a new 64 bit next xid?

> Just to be clear, I am not planning to pursue writing a patch for this
> at the moment. So, if anybody else is interested or if Andres wants
> to write it, I can help in the review.

I played around with this idea yesterday. Experiment-grade patch
attached. Approach:

1. Introduce a new type BigTransactionId (better names welcome).
2. Change ShmemVariableCache->nextXid to ShmemVariableCache->nextBigXid.
3. Change checkpoints to use nextBigXid too.
4. Change ReadNewTransactionId() to ReadNextBigTransactionId().
5. Remove GetNextXidAndEpoch() as it's now redundant.
6. Everywhere that was reading ShmemVariableCache->nextXid or calling
ReadNewTransactionId() but actually needs an xid now uses an explicit
conversion macro XidFromBigTransactionId().
7. Everywhere that was writing ShmemVariableCache->nextXid but only
has an xid now goes through a new function
AdvanceNextBigTransactionIdPast(xid), to handle recovery (since WAL
records have xids in them as mentioned by Amit).

I think it's probably a good idea to make it very explicit when moving
between big and small transaction IDs, hence the including of the word
'big' in variable and function names and the use of a function-like
macro (rather than implicit conversion, which C doesn't give me a good
way to prevent). Otherwise there is a class of bug that is hidden for
the first 2^32 transactions.

The logic in CreateCheckPoint() that assumed that there could be only
one wraparound per checkpoint is gone (the problem reported in this
thread). Note that AdvanceNextBigTransactionIdPast() contains logic
that infers an epoch, which is in some ways similar to what the
removed code was doing, but in this case I think all callers must have
an xid from the same or next epoch.

I'm probably missing a few details... this is only a first swing at
the problem. It does pass check-world and various xid wraparound
tests I came up with. Clearly big xids could spread to more places
than I show here. Do you see problems with this approach or have
better ideas?

Thomas Munro

Attachment Content-Type Size
0001-Track-the-next-xid-using-64-bits.patch application/octet-stream 43.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-07-09 23:40:14 small development tip: Consider using the gold linker
Previous Message Tom Lane 2018-07-09 23:29:42 Re: [HACKERS] Possibly too stringent Assert() in b-tree code