Re: Usage of epoch in txid_current

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Usage of epoch in txid_current
Date: 2017-12-06 12:09:09
Message-ID: CAA4eK1+A7nP4N8UT6_d=s0yuSq9Md8Z2cL3h-4LVM6YZmspVBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 5, 2017 at 11:15 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-12-05 16:21:27 +0530, Amit Kapila wrote:
>
>> Okay, it is quite strange that we haven't discovered this problem till
>> now. I think we should do something to fix it. One idea is that we
>> track epoch change in shared memory (probably in the same data
>> structure (VariableCacheData) where we track nextXid). We need to
>> increment it when the xid wraparound during xid allocation (in
>> GetNewTransactionId). Also, we need to make it persistent as which
>> means we need to log it in checkpoint xlog record and we need to write
>> a separate xlog record for the epoch change.
>
> I think it makes a fair bit of sense to not do the current crufty
> tracking of xid epochs. I don't really how we got there, but it doesn't
> make terribly much sense. Don't think we need additional WAL logging
> though - we should be able to piggyback this onto the already existing
> clog logging.
>
> I kinda wonder if we shouldn't just track nextXid as a 64bit integer
> internally, instead of bothering with tracking the epoch
> separately. Then we can "just" truncate it in the cases where it's
> stored in space constrained places etc.
>

We are using ShmemVariableCache->nextXid at many places so always
converting/truncating it to 32-bit number before using seems slightly
awkward, so we can think of using a separate nextBigXid 64bit number
as well. Either way, it is not clear to me how we will keep it
updated after recovery. Right now, the mechanism is quite simple, at
the beginning of a recovery we take the value of nextXid from
checkpoint record and then if any xlog record indicates xid that
follows nextXid, we advance it. Here, the point to note is that we
take the xid from the WAL record (which means that it assumes xids are
non-contiguous or some xids are consumed without being logged) and
increment it. Unless we plan to change something in that logic (like
storing 64-bit xids in WAL records), it is not clear to me how to make
it work. OTOH, recovering value of epoch which increments only at
wraparound seems fairly straightforward as described in my previous
email.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-12-06 13:48:08 Re: pgsql: Support Parallel Append plan nodes.
Previous Message Anthony Bykov 2017-12-06 11:40:54 Re: Jsonb transform for pl/python