Re: Hot standby, slot ids and stuff

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, slot ids and stuff
Date: 2009-01-10 09:40:56
Message-ID: 1231580456.18005.608.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote:

> Since I've been whining about that for some time, I figured I have to
> put my money where my mouth is, so here's a patch based on v6a that
> eliminates the concept of slotids, as well as the new xl_info2 field in
> XLogRecord. This seems much simpler to me. I haven't given it much
> testing, but seems to work. There's a whole bunch of comments marked
> with XXX that need resolving, though.

There's a confusion in the patch between top level xid and parent xid.
The xl field is named parentxid but actually contains top level. That
distinction is important because the code now uses the top level xid to
locate the recovery proc, formerly the role of the slotid.

This leads to an error when we SubTransSetParent(child_xid, top_xid);
since this assumes that the top_xid is the parent, which it is not.
Mostly you wouldn't notice unless you were looking up the subtrans
status for an xid that had committed but was the child of an aborted
subtransaction, with the top level xid having > 64 subtransactions. It's
possible the confusion leads to other bugs in UnobservedXid processing,
but I didn't look too hard at that.

AFAICS we need both parent and top xids. Or put another way, we need the
parent xid and other info that allows us to uniquely determine the proc
we need to update. Now the "other info..." could be top xid or it could
also be slotid, which then avoids later zig-zagging to look up the proc.

I'm wasn't looking for ways to reintroduce slotid, but it seems more
logical to keep slotid in light of the above. However, you will probably
view this as intransigence, so I will await your input.

I'm very happy that GetStandbyInfoForTransaction() and all the XLR2
flags have bitten the dust and will sleep for eternity.

For xl_rel_lock you add a field called xid and then ask
/* xid of the *parent* transaction. XXX why parent? */.
You've done this because it replaced slotid. But looking at that, I
think the 6a patch had a bug there: a subtransaction abort record would
release locks for the whole top level xact. So we need to pass both top
level xid (or slotid) and xid for each lock, then release using the
actual xid only.

You also ask: Shouldn't we call StartupSUBTRANS() and the other startup
functions like we do below, before letting anyone in?

My answer is that the current role of StartupSUBTRANS and friends is not
appropriate at that point, since they zero out those structures. I left
those routines in place thinking "startup" meant "moving to normal
running". If we did have a "startupsubtrans" at the point you note, it
would currently be empty: we don't keep track of the latest page during
recovery. Perhaps we should, but then we'd need to do the equivalent of
ExtendSubtrans etc, which it seemed easier to avoid.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2009-01-10 10:16:56 Synch Rep v5
Previous Message Pavel Stehule 2009-01-10 08:51:12 Re: [HACKERS] BUG #4516: FOUND variable does not work after RETURN QUERY