Freezing without write I/O

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Freezing without write I/O
Date: 2013-05-30 13:33:50
Message-ID: 51A7553E.5070601@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Since we're bashing around ideas around freezing, let me write down the
idea I've been pondering and discussing with various people for years. I
don't think I invented this myself, apologies to whoever did for not
giving credit.

The reason we have to freeze is that otherwise our 32-bit XIDs wrap
around and become ambiguous. The obvious solution is to extend XIDs to
64 bits, but that would waste a lot space. The trick is to add a field
to the page header indicating the 'epoch' of the XID, while keeping the
XIDs in tuple header 32-bit wide (*).

The other reason we freeze is to truncate the clog. But with 64-bit
XIDs, we wouldn't actually need to change old XIDs on disk to FrozenXid.
Instead, we could implicitly treat anything older than relfrozenxid as
frozen.

That's the basic idea. Vacuum freeze only needs to remove dead tuples,
but doesn't need to dirty pages that contain no dead tuples.

Since we're not storing 64-bit wide XIDs on every tuple, we'd still need
to replace the XIDs with FrozenXid whenever the difference between the
smallest and largest XID on a page exceeds 2^31. But that would only
happen when you're updating the page, in which case the page is dirtied
anyway, so it wouldn't cause any extra I/O.

This would also be the first step in allowing the clog to grow larger
than 2 billion transactions, eliminating the need for anti-wraparound
freezing altogether. You'd still want to truncate the clog eventually,
but it would be nice to not be pressed against the wall with "run vacuum
freeze now, or the system will shut down".

(*) "Adding an epoch" is inaccurate, but I like to use that as my mental
model. If you just add a 32-bit epoch field, then you cannot have xids
from different epochs on the page, which would be a problem. In reality,
you would store one 64-bit XID value in the page header, and use that as
the "reference point" for all the 32-bit XIDs on the tuples. See
existing convert_txid() function for how that works. Another method is
to store the 32-bit xid values in tuple headers as offsets from the
per-page 64-bit value, but then you'd always need to have the 64-bit
value at hand when interpreting the XIDs, even if they're all recent.

- Heikki

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-05-30 13:47:22 Re: removing PD_ALL_VISIBLE
Previous Message Greg Smith 2013-05-30 13:29:24 Re: fallocate / posix_fallocate for new WAL file creation (etc...)