Re: Transaction ID wraparound: problem and proposed solution

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Transaction ID wraparound: problem and proposed solution
Date: 2001-01-20 05:00:09
Message-ID: 200101200500.AAA05265@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I have added this email thread to TODO.detail.

> We've expended a lot of worry and discussion in the past about what
> happens if the OID generator wraps around. However, there is another
> 4-byte counter in the system: the transaction ID (XID) generator.
> While OID wraparound is survivable, if XIDs wrap around then we really
> do have a Ragnarok scenario. The tuple validity checks do ordered
> comparisons on XIDs, and will consider tuples with xmin > current xact
> to be invalid. Result: after wraparound, your whole database would
> instantly vanish from view.
>
> The first thought that comes to mind is that XIDs should be promoted to
> eight bytes. However there are several practical problems with this:
> * portability --- I don't believe long long int exists on all the
> platforms we support.
> * performance --- except on true 64-bit platforms, widening Datum to
> eight bytes would be a system-wide performance hit, which is a tad
> unpleasant to fix a scenario that's not yet been reported from the
> field.
> * disk space --- letting pg_log grow without bound isn't a pleasant
> prospect either.
>
> I believe it is possible to fix these problems without widening XID,
> by redefining XIDs in a way that allows for wraparound. Here's my
> plan:
>
> 1. Allow XIDs to range from 0 to WRAPLIMIT-1 (WRAPLIMIT is not
> necessarily 4G, see discussion below). Ordered comparisons on XIDs
> are no longer simply "x < y", but need to be expressed as a macro.
> We consider x < y if (y - x) % WRAPLIMIT < WRAPLIMIT/2.
> This comparison will work as long as the range of interesting XIDs
> never exceeds WRAPLIMIT/2. Essentially, we envision the actual value
> of XID as being the low-order bits of a logical XID that always
> increases, and we assume that no extant XID is more than WRAPLIMIT/2
> transactions old, so we needn't keep track of the high-order bits.
>
> 2. To keep the system from having to deal with XIDs that are more than
> WRAPLIMIT/2 transactions old, VACUUM should "freeze" known-good old
> tuples. To do this, we'll reserve a special XID, say 1, that is always
> considered committed and is always less than any ordinary XID. (So the
> ordered-comparison macro is really a little more complicated than I said
> above. Note that there is already a reserved XID just like this in the
> system, the "bootstrap" XID. We could simply use the bootstrap XID, but
> it seems better to make another one.) When VACUUM finds a tuple that
> is committed good and has xmin < XmaxRecent (the oldest XID that might
> be considered uncommitted by any open transaction), it will replace that
> tuple's xmin by the special always-good XID. Therefore, as long as
> VACUUM is run on all tables in the installation more often than once per
> WRAPLIMIT/2 transactions, there will be no tuples with ordinary XIDs
> older than WRAPLIMIT/2.
>
> 3. At wraparound, the XID counter has to be advanced to skip over the
> InvalidXID value (zero) and the reserved XIDs, so that no real transaction
> is generated with those XIDs. No biggie here.
>
> 4. With the wraparound behavior, pg_log will have a bounded size: it
> will never exceed WRAPLIMIT*2 bits = WRAPLIMIT/4 bytes. Since we will
> recycle pg_log entries every WRAPLIMIT xacts, during transaction start
> the xact manager will have to take care to actively clear its pg_log
> entry to zeroes (I'm not sure if it does that already, or just assumes
> that new pg_log entries will start out zero). As long as that happens
> before the xact makes any data changes, it's OK to recycle the entry.
> Note we are assuming that no tuples will remain in the database with
> xmin or xmax equal to that XID from a prior cycle of the universe.
>
> This scheme allows us to survive XID wraparound at the cost of slight
> additional complexity in ordered comparisons of XIDs (which is not a
> really performance-critical task AFAIK), and at the cost that the
> original insertion XIDs of all but recent tuples will be lost by
> VACUUM. The system doesn't particularly care about that, but old XIDs
> do sometimes come in handy for debugging purposes. A possible
> compromise is to overwrite only XIDs that are older than, say,
> WRAPLIMIT/4 instead of doing so as soon as possible. This would mean
> the required VACUUM frequency is every WRAPLIMIT/4 xacts instead of
> every WRAPLIMIT/2 xacts.
>
> We have a straightforward tradeoff between the maximum size of pg_log
> (WRAPLIMIT/4 bytes) and the required frequency of VACUUM (at least
> every WRAPLIMIT/2 or WRAPLIMIT/4 transactions). This could be made
> configurable in config.h for those who're intent on customization,
> but I'd be inclined to set the default value at WRAPLIMIT = 1G.
>
> Comments? Vadim, is any of this about to be superseded by WAL?
> If not, I'd like to fix it for 7.1.
>
> regards, tom lane
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lance Taylor 2001-01-20 05:39:59 Re: AW: AW: AW: Re: tinterval - operator problems on AIX
Previous Message Bruce Momjian 2001-01-20 04:50:55 Re: Leaking definitions to user programs