Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tianzhou Chen <tianzhouchen(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Date: 2017-06-22 15:36:43
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut <
> peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>> On 6/6/17 08:29, Bruce Momjian wrote:
>> > On Tue, Jun 6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
>> >> Tom's point is, I think, that we'll want to stay pg_upgrade
>> >> compatible. So when we see a pg10 tuple and want to add a new page
>> >> with a new page header that has an epoch, but the whole page is full
>> >> so there isn't 32 bits left to move tuples "down" the page, what do we
>> >> do?
>> >
>> > I guess I am missing something. If you see an old page version number,
>> > you know none of the tuples are from running transactions so you can
>> > just freeze them all, after consulting the pg_clog. What am I missing?
>> > If the page is full, why are you trying to add to the page?
>> The problem is if you want to delete from such a page. Then you need to
>> update the tuple's xmax and stick the new xid epoch somewhere.
>> We had an unconference session at PGCon about this. These issues were
>> all discussed and some ideas were thrown around. We can expect a patch
>> to appear soon, I think.
> Right. I'm now working on splitting my large patch for 64-bit xids into
> patchset.
> I'm planning to post patchset in the beginning of next week.

Work on this patch took longer than I expected. It is still in not so good
shape, but I decided to publish it anyway in order to not stop progress in
this area.
I also tried to split this patch into several. But actually I manage to
separate few small pieces, while most of changes are remaining in the
single big diff.
Long story short, patchset is attached.

This patch implements 64 bit GUCs and relation options which are used in
further patches.

Putting xid and multixact bases into PageHeaderData would take extra 16
bytes on index pages too. That would be waste of space for indexes. This
is why I decided to put bases into special area of heap pages.
This patch adds special area for heap pages contaning prune xid and magic
number. Magic number is different for regular heap page and sequence page.

It's the major patch. It redefines TransactionID ad 64-bit integer and
defines 32-bit ShortTransactionID which is used for t_xmin and t_xmax.
Transaction id comparison becomes straight instead of circular. Base values
for xids and multixact ids are stored in heap page special. SLRUs also
became 64-bit and non-circular. To be able to calculate xmin/xmax without
accessing heap page, base values are copied into HeapTuple.
Correspondingly HeapTupleHeader(Get|Set)(Xmin|Xmax) becomes just
HeapTuple(Get|Set)(Xmin|Xmax) whose require HeapTuple not just
HeapTupleHeader. heap_page_prepare_for_xid() is used to ensure that given
xid fits particular page base. If it doesn't fit then base of page is
shifted, that could require single-page freeze. Format for wal is changed
in order to prevent unaligned access to TransactionId. *_age GUCs and
relation options are changed to 64-bit. Forced "autovacuum to prevent
wraparound" is removed, but there is still freeze to truncate SLRUs.

This patch is used for testing that calculations using 64-bit bases and
short 32-bit xid values are correct. It provides initdb options for
initial xid, multixact id and multixact offset values. Regression tests
initialize cluster with large (more than 2^32) values.

There are a lot of open items, but I would like to notice some of them:
* WAL becomes significantly larger due to storage 8 byte xids instead of 4
byte xids. Probably, its needed to use base approach in WAL too.
* As discussed in developer unconference, we need to write special
background worker which would ensure that each heap page can fit bases.
This background worker should finish its work before database could be
pg_upgraded. Alternatively, we could find a way to store bases in the
existing page header.
* BTPageOpaqueData contains TransactionID in special area.
BTPageOpaqueData should be changed to some pg_upgradable format.

Alexander Korotkov
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
0001-64bit-guc-relopt-1.patch application/octet-stream 24.9 KB
0002-heap-page-special-1.patch application/octet-stream 20.3 KB
0003-64bit-xid-1.patch application/octet-stream 600.6 KB
0004-base-values-for-testing-1.patch application/octet-stream 18.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-06-22 15:40:04 Re: [patch] pg_dump/pg_restore zerror() and strerror() mishap
Previous Message Daniel Gustafsson 2017-06-22 15:21:15 Re: Multiple TO version in ALTER EXTENSION UPDATE