Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Tianzhou Chen <tianzhouchen(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Date: 2017-06-07 07:47:22
Message-ID: 4ff4d6d3-7b3e-584a-8aea-e4c59ae95588@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:
> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>
>>> What happens when the epoch is so low that the rest of the XID does
>>> not fit in 32bits of tuple header? Or such a case should never arise?
>>
>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>> more than one epoch. So if you're updating/deleting an extremely old
>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>> isn't already, so you can set a new epoch and xmax.
>
> If the page has multiple such tuples, updating one tuple will mean
> updating headers of other tuples as well? This means that those tuples
> need to be locked for concurrent scans? May be not, since such tuples
> will be anyway visible to any concurrent scans and updating xmin/xmax
> doesn't change the visibility. But we might have to prevent multiple
> updates to the xmin/xmax because of concurrent updates on the same
> page.

"Store the epoch in the page header" is actually a slightly
simpler-to-visualize, but incorrect, version of what we actually need to
do. If you only store the epoch, then all the XIDs on a page need to
belong to the same epoch, which causes trouble when the current epoch
changes. Just after the epoch changes, you cannot necessarily freeze all
the tuples from the previous epoch, because they would not yet be
visible to everyone.

The full picture is that we need to store one 64-bit XID "base" value in
the page header, and all the xmin/xmax values in the tuple headers are
offsets relative to that base. With that, you effectively have 64-bit
XIDs, as long as the *difference* between any two XIDs on a page is not
greater than 2^32. That can be guaranteed, as long as we don't allow a
transaction to be in-progress for more than 2^32 XIDs. That seems like a
reasonable limitation.

But yes, when the "current XID - base XID in page header" becomes
greater than 2^32, and you need to update a tuple on that page, you need
to first freeze the page, update the base XID on the page header to a
more recent value, and update the XID offsets on every tuple on the page
accordingly. And to do that, you need to hold a lock on the page. If you
don't move any tuples around at the same time, but just update the XID
fields, and exclusive lock on the page is enough, i.e. you don't need to
take a super-exclusive or vacuum lock. In any case, it happens so
infrequently that it should not become a serious burden.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2017-06-07 07:49:18 Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
Previous Message Thomas Munro 2017-06-07 07:27:39 Re: PG10 transition tables, wCTEs and multiple operations on the same table