Re: 64-bit XIDs again

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit XIDs again
Date: 2015-07-31 13:50:43
Message-ID: CANP8+jKDQUrt1J6ZJY9s3v_pRd4B34AtohRWxB5QBSnEgFSVnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 July 2015 at 11:00, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
wrote:

>
> If use upgrade database cluster with pg_upgrade, he would stop old
> postmaster, pg_upgrade, start new postmaster. That means we start from the
> point when there is no running transactions. Thus, between tuples of old
> format there are two kinds: visible for everybody and invisible for
> everybody. When update or delete old tuple of first kind, we actually don't
> need to store its xmin anymore. We can store 64bit xmax in the place of
> xmin/xmax.
>
> So, in order to switch to 64bit xmin/xmax, we have to take both free bits
> form t_infomask2 in order to implements it. They should indicate one of 3
> possible tuple formats:
> 1) Old format: both xmin/xmax are 32bit
> 2) Intermediate format: xmax is 64bit, xmin is frozen.
> 3) New format: both xmin/xmax are 64bit.
>
> But we can use same idea to implement epoch in heap page header as well.
> If new page header doesn't fits the page, then we don't have to insert
> something to this page, we just need to set xmax and flags to existing
> tuples. Then we can use two format from listed above: #1 and #2, and take
> one free bit from t_infomask2 for format indication.
>

I think we can do it by treating the page level epoch as a means of
compression, rather than as a barrier which is how I first saw it.

New Page Format
New Page format has a page-level epoch.

First tuple inserted onto a block sets the page epoch. For later inserts,
we check whether the current epoch matches the page epoch. If it doesn't,
we try to freeze the page. If all tuples can be frozen on the page, we can
then reset the page level epoch as part of our insert. If we can't then
freeze all tuples on the page, we extend the relation to allow us to add a
new page with current epoch on it. (We can't easily track which blocks have
which epoch).

If an Update or Deletes sees a tuple from a prior epoch, we will try to
freeze the tuple. If we can, then we reuse xmin as the xmax's epoch. If we
can't we have problems and need a complex mechanism to avoid problems. I
think it won't be necessary to invent that in the first release, we will
just assume it is possible.

Current Pages
Current pages don't have an epoch, so we store a base epoch in the
controlfile so we remember how to interpret them.

We don't create any new pages with this page format. For later inserts, we
check whether the current epoch matches the page epoch. If it doesn't, we
check whether its possible to rewrite the whole page to new format,
freezing as we go. If that is not possible, we extend the relation to allow
us to add a new page with current epoch on it. (We can't easily track which
blocks have which epoch).

If an Update or Deletes sees a tuple from a prior epoch, we will try to
freeze the tuple. If we can, then we reuse xmin as the xmax's epoch.

I don't think we need any new tuple formats to do this.

This means we have
* changes to allow new bufpage format
* changes in hio.c for page selection
* changes to allow xmin to be reused when freeze bit set

Very little additional path length in the common case.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2015-07-31 14:23:46 Re: brin index vacuum versus transaction snapshots
Previous Message Tom Lane 2015-07-31 13:46:29 Re: Encoding of early PG messages