Hot standby, overflowed snapshots, testing

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Hot standby, overflowed snapshots, testing
Date: 2009-11-13 20:19:38
Message-ID: 4AFDBF5A.7030902@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In GetSnapshotData(), we set subcount to -1 if the snapshot was overflowed:

> subcount = GetKnownAssignedTransactions(snapshot->subxip,
> &xmin, xmax, &overflow);
>
> /*
> * See if we have removed any subxids from KnownAssignedXids that
> * we might need to see. If so, mark snapshot overflowed.
> */
> if (overflow)
> subcount = -1; /* overflowed */

In XidInMVCCSnapshot we do this:

> /*
> * In recovery we store all xids in the subxact array because it
> * is by far the bigger array, and we mostly don't know which xids
> * are top-level and which are subxacts. The xip array is empty.
> *
> * We start by searching subtrans, if we overflowed.
> */
> if (snapshot->subxcnt < 0)
> {
> /* overflowed, so convert xid to top-level */
> xid = SubTransGetTopmostTransaction(xid);
>
> /*
> * If xid was indeed a subxact, we might now have an xid < xmin, so
> * recheck to avoid an array scan. No point in rechecking xmax.
> */
> if (TransactionIdPrecedes(xid, snapshot->xmin))
> return false;
> }
>
> /*
> * We now have either a top-level xid higher than xmin or an
> * indeterminate xid. We don't know whether it's top level or subxact
> * but it doesn't matter. If it's present, the xid is visible.
> */
> for (j = 0; j < snapshot->subxcnt; j++)
> {
> if (TransactionIdEquals(xid, snapshot->subxip[j]))
> return true;
> }

Note that if subxcnt is -1 to mark that the snapshot is overflowed, the
for-loop will do nothing. IOW, if the snapshot is overflowed,
XidInMVCCSnapshot always returns false.

This seems pretty straightforward to fix, we'll just need separate flag
to mark whether the subxid array has overflowed, but I'm bringing this
up as a separate email because this is the 2nd bug in XidInMVCCSnapshot
already, the first one being the silly one that the if-condition was
backwards. It seems that no-one still has done any testing of the
subxact stuff and visibility, and that is pretty scary.

I got the impression earlier that you had some test environment set up
to test hot standby. Can you share any details of what test cases you've
run?

I think we're going to have a good number of volunteers to test Hot
Standby, but it would be useful to define some specific test cases to
exercise all the hairy recovery transaction tracking and snapshot
related things. Or at least provide a list of the difficult parts so
that people know to what to test. If people know roughly what the tricky
areas are, we'll get better coverage than if people just kick the tires.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2009-11-13 20:30:33 Re: Missing feature in plpgsql EXECUTE ... USING support
Previous Message Greg Smith 2009-11-13 19:52:25 CommitFest 2009-11: Almost done with initial assignments