Re: Freezing without write I/O

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Freezing without write I/O
Date: 2013-05-31 03:02:40
Message-ID: CA+TgmoaG1+2CNQe5aYpMKukPbdT0krK=L2fuZ6A-0FeeuCFmkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 30, 2013 at 2:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Random thought: Could you compute the reference XID based on the page
> LSN? That would eliminate the storage overhead.

After mulling this over a bit, I think this is definitely possible.
We begin a new "half-epoch" every 2 billion transactions. We remember
the LSN at which the current half-epoch began and the LSN at which the
previous half-epoch began. When a new half-epoch begins, the first
backend that wants to stamp a tuple with an XID from the new
half-epoch must first emit a "new half-epoch" WAL record, which
becomes the starting LSN for the new half-epoch.

We define a new page-level bit, something like PD_RECENTLY_FROZEN.
When this bit is set, it means there are no unfrozen tuples on the
page with XIDs that predate the current half-epoch. Whenever we know
this to be true, we set the bit. If the page LSN crosses more than
one half-epoch boundary at a time, we freeze the page and set the bit.
If the page LSN crosses exactly one half-epoch boundary, then (1) if
the bit is set, we clear it and (2) if the bit is not set, we freeze
the page and set the bit. The advantage of this is that we avoid an
epidemic of freezing right after a half-epoch change. Immediately
after a half-epoch change, many pages will mix tuples from the current
and previous half-epoch - but relatively few pages will have tuples
from the current half-epoch and a half-epoch more than one in the
past.

As things stand today, we really only need to remember the last two
half-epoch boundaries; they could be stored, for example, in the
control file. But if we someday generalize CLOG to allow indefinite
retention as you suggest, we could instead remember all half-epoch
boundaries that have ever occurred; just maintain a file someplace
with 8 bytes of data for every 2 billion XIDs consumed over the
lifetime of the cluster. In fact, we might want to do it that way
anyhow, just to keep our options open, and perhaps for forensics.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brendan Jurd 2013-05-31 07:34:43 Re: 9.3: Empty arrays returned by array_remove()
Previous Message Amit Langote 2013-05-31 02:16:04 Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries