Re: Deriving Recovery Snapshots

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deriving Recovery Snapshots
Date: 2008-10-22 14:18:29
Message-ID: 48FF3635.4000109@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Wed, 2008-10-22 at 12:29 +0300, Heikki Linnakangas wrote:
>
>> How about:
>>
>> 1. Keep all transactions and subtransactions in UnobservedXids.
>> 2. If it fills up, remove all subtransactions from it, that the startup
>> process knows to be subtransactions and knows the parents, and update
>> subtrans. Mark the array as overflowed.
>>
>> To take a snapshot, a backend simply copies UnobservedXids array and the
>> flag. If it hasn't overflowed, a transaction is considered to be in
>> progress if it's in the array. If it has overflowed, and the xid is not
>> in the array, check subtrans
>
> We can't check subtrans. We do not have any record of what the parent is
> for an unobserved transaction id. So the complete list of unobserved
> xids *must* be added to the snapshot. If that makes snapshot overflow,
> we have a big problem: we would be forced to say "sorry snapshot cannot
> be issued at this time, please wait". Ugh!

That's why we still need the occasional WAL logging in
AssignTransactionId(). To log the parent-child relationships of the
subtransactions.

>> For the startup process to know about the parent-child relationships,
>> we'll need something like WAL changes you suggested. I'm not too
>> thrilled about adding a new field to all WAL records. Seems simpler to
>> just rely on the new WAL records on AssignTransactionId(), and we can
>> only do it, say, every 100 subtransactions, if we make the
>> UnobservedXids array big enough (100*max_connections).
>
> Yes, we can make the UnobservedXids array bigger, but only to the point
> where it will all fit within a snapshot.

The list of xids in a snapshot is just a palloc'd array, in
backend-local memory, so we can easily make it as large as we need to.

> Every new subxid needs to specify its parent's xid. We must supply that
> information somehow: either via an XLOG_XACT_ASSIGNMENT, or as I have
> done in most cases, tuck that into the wasted space on the xlrec.
> Writing a WAL record every 100 subtransactions will not work: we need to
> write to subtrans *before* that xid appears anywhere on disk, so that
> visibility tests can determine the status of the transaction.

I don't follow. It doesn't need to be in subtrans before it appears on
disk, AFAICS. It can be stored in UnobservedXids at first, and when it
overflows, we can update subtrans and remove the entries from
UnobservedXids. A snapshot taken before the overflow will have the
subxid in its copy of UnobservedXids, and one taken after overflow will
find it in subtrans.

If UnobservedXids is large enough to hold, say 100 * max_connections
xids, by writing a WAL record containing the parent-child relationships
every 100 assigned subtransactions within a top-level transaction, the
top-level transactions and those subtransactions that we don't know the
parent of will always fit into UnobservedXids.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-10-22 14:29:48 Re: Deriving Recovery Snapshots
Previous Message Peter Eisentraut 2008-10-22 14:15:15 SQL:2008 CURRENT_CATALOG and CURRENT_SCHEMA