Re: Hot standby, recovery procs

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, recovery procs
Date: 2009-02-24 23:41:35
Message-ID: 1235518895.16176.285.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 2009-02-24 at 22:29 +0200, Heikki Linnakangas wrote:

> Oh, right... But we have the same problem with the subtransactions,
> don't we? This block:
>
> > /*
> > * If our state information is later for this proc, then
> > * overwrite it. It's possible for a commit and possibly
> > * a new transaction record to have arrived in WAL in between
> > * us doing GetRunningTransactionData() and grabbing the
> > * WALInsertLock, so we musn't assume we always know best.
> > */
> > if (XLByteLT(proc->lsn, lsn))
> > {
> > TransactionId *subxip = (TransactionId *) &(xlrec->xrun[xlrec->xcnt]);
> >
> > proc->lsn = lsn;
> > /* proc-> pid stays 0 for Recovery Procs */
> >
> > proc->subxids.nxids = rxact[xid_index].nsubxids;
> > proc->subxids.overflowed = rxact[xid_index].overflowed;
> >
> > memcpy(proc->subxids.xids, subxip,
> > rxact[xid_index].nsubxids * sizeof(TransactionId));
> >
> > /* Remove subtransactions from UnobservedXids also */
> > if (unobserved)
> > {
> > for (index = 0; index < rxact[xid_index].nsubxids; index++)
> > UnobservedTransactionsRemoveXid(subxip[index + rxact[xid_index].subx_offset], false);
> > }
> > }
>
> overwrites subxids array, and will resurrect any already aborted
> subtransaction.
>
> Isn't XLByteLT(proc->lsn, lsn) always true, because 'lsn' is the lsn of
> the WAL record we're redoing, so there can't be any procs with an LSN
> higher than that?

I'm wondering whether we need those circumstances at all.

The main role of ProcArrayUpdateRecoveryTransactions() is two-fold
* initialise snapshot when there isn't one
* reduce possibility of FATAL errors that don't write abort records

Neither of those needs us to update the subxid cache, so we'd be better
off avoiding that altogether in the common case. So we should be able to
ignore the lsn and race conditions altogether.

It might even be more helpful to explicitly separate those twin roles so
the code is clearer.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Frank Featherlight 2009-02-25 00:01:14 Re: Service not starting: Error 1053
Previous Message Simon Riggs 2009-02-24 22:58:45 Re: Synchronous replication & Hot standby patches