Re: Transaction ID wraparound: problem and proposed solution

From: ncm(at)zembu(dot)com (Nathan Myers)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Transaction ID wraparound: problem and proposed solution
Date: 2001-01-20 08:29:24
Message-ID: 20010120002924.A2797@store.zembu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think the XID wraparound matter might be handled a bit more simply.

Given a global variable X which is the earliest XID value in use at
some event (e.g. startup) you can compare two XIDs x and y, using
unsigned arithmetic, with just (x-X < y-X). This has the further
advantage that old transaction IDs need be "frozen" only every 4G
transactions, rather than Tom's suggested 256M or 512M transactions.
"Freezing", in this scheme, means to set all older XIDs to equal the
chosen X, rather than setting them to some constant reserved value.
No special cases are required for the comparison, even for folded
values; it is (x-X < y-X) for all valid x and y.

I don't know the role of the "bootstrap" XID, or how it must be
fitted into the above.

Nathan Myers
ncm(at)zembu(dot)com

------------------------------------------------------------
> We've expended a lot of worry and discussion in the past about what
> happens if the OID generator wraps around. However, there is another
> 4-byte counter in the system: the transaction ID (XID) generator.
> While OID wraparound is survivable, if XIDs wrap around then we really
> do have a Ragnarok scenario. The tuple validity checks do ordered
> comparisons on XIDs, and will consider tuples with xmin > current xact
> to be invalid. Result: after wraparound, your whole database would
> instantly vanish from view.
>
> The first thought that comes to mind is that XIDs should be promoted to
> eight bytes. However there are several practical problems with this:
> * portability --- I don't believe long long int exists on all the
> platforms we support.
> * performance --- except on true 64-bit platforms, widening Datum to
> eight bytes would be a system-wide performance hit, which is a tad
> unpleasant to fix a scenario that's not yet been reported from the
> field.
> * disk space --- letting pg_log grow without bound isn't a pleasant
> prospect either.
>
> I believe it is possible to fix these problems without widening XID,
> by redefining XIDs in a way that allows for wraparound. Here's my
> plan:
>
> 1. Allow XIDs to range from 0 to WRAPLIMIT-1 (WRAPLIMIT is not
> necessarily 4G, see discussion below). Ordered comparisons on XIDs
> are no longer simply "x < y", but need to be expressed as a macro.
> We consider x < y if (y - x) % WRAPLIMIT < WRAPLIMIT/2.
> This comparison will work as long as the range of interesting XIDs
> never exceeds WRAPLIMIT/2. Essentially, we envision the actual value
> of XID as being the low-order bits of a logical XID that always
> increases, and we assume that no extant XID is more than WRAPLIMIT/2
> transactions old, so we needn't keep track of the high-order bits.
>
> 2. To keep the system from having to deal with XIDs that are more than
> WRAPLIMIT/2 transactions old, VACUUM should "freeze" known-good old
> tuples. To do this, we'll reserve a special XID, say 1, that is always
> considered committed and is always less than any ordinary XID. (So the
> ordered-comparison macro is really a little more complicated than I said
> above. Note that there is already a reserved XID just like this in the
> system, the "bootstrap" XID. We could simply use the bootstrap XID, but
> it seems better to make another one.) When VACUUM finds a tuple that
> is committed good and has xmin < XmaxRecent (the oldest XID that might
> be considered uncommitted by any open transaction), it will replace that
> tuple's xmin by the special always-good XID. Therefore, as long as
> VACUUM is run on all tables in the installation more often than once per
> WRAPLIMIT/2 transactions, there will be no tuples with ordinary XIDs
> older than WRAPLIMIT/2.
>
> 3. At wraparound, the XID counter has to be advanced to skip over the
> InvalidXID value (zero) and the reserved XIDs, so that no real transaction
> is generated with those XIDs. No biggie here.
>
> 4. With the wraparound behavior, pg_log will have a bounded size: it
> will never exceed WRAPLIMIT*2 bits = WRAPLIMIT/4 bytes. Since we will
> recycle pg_log entries every WRAPLIMIT xacts, during transaction start
> the xact manager will have to take care to actively clear its pg_log
> entry to zeroes (I'm not sure if it does that already, or just assumes
> that new pg_log entries will start out zero). As long as that happens
> before the xact makes any data changes, it's OK to recycle the entry.
> Note we are assuming that no tuples will remain in the database with
> xmin or xmax equal to that XID from a prior cycle of the universe.
>
> This scheme allows us to survive XID wraparound at the cost of slight
> additional complexity in ordered comparisons of XIDs (which is not a
> really performance-critical task AFAIK), and at the cost that the
> original insertion XIDs of all but recent tuples will be lost by
> VACUUM. The system doesn't particularly care about that, but old XIDs
> do sometimes come in handy for debugging purposes. A possible
> compromise is to overwrite only XIDs that are older than, say,
> WRAPLIMIT/4 instead of doing so as soon as possible. This would mean
> the required VACUUM frequency is every WRAPLIMIT/4 xacts instead of
> every WRAPLIMIT/2 xacts.
>
> We have a straightforward tradeoff between the maximum size of pg_log
> (WRAPLIMIT/4 bytes) and the required frequency of VACUUM (at least
> every WRAPLIMIT/2 or WRAPLIMIT/4 transactions). This could be made
> configurable in config.h for those who're intent on customization,
> but I'd be inclined to set the default value at WRAPLIMIT = 1G.
>
> Comments? Vadim, is any of this about to be superseded by WAL?
> If not, I'd like to fix it for 7.1.
>
> regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2001-01-20 13:28:45 Postgresql on win32
Previous Message Tom Lane 2001-01-20 07:01:33 Re: C++ interface build on FreeBSD 4.2 broken?