Skip site navigation (1) Skip section navigation (2)

Re: Hot Standby: too many KnownAssignedXids

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby: too many KnownAssignedXids
Date: 2010-12-02 11:25:02
Message-ID: 1291289102.2006.1060.camel@ebony (view raw or flat)
Thread:
Lists: pgsql-hackers
On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
> On 02.12.2010 11:02, Simon Riggs wrote:
> > The cause of the issue is that replay starts at one LSN and there is a
> > delay until the RunningXacts WAL record occurs. If there was no delay,
> > there would be no issue at all. In CreateCheckpoint() we start by
> > grabbing the WAInsertLock and later recording that pointer as part of
> > the checkpoint record. My proposal is to replace the "grab the lock"
> > code with the insert of the RunningXacts WAL record (when wal_level
> > set), so that recovery always starts with that record type.
> 
> Oh, interesting idea. But AFAICS closing the gap between acquiring the 
> running-xacts snapshot and writing it to the log is sufficient, I don't 
> see what moving the running-xacts record buys us. Does it allow some 
> further simplifications somewhere?

Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.

Earlier on this thread, we discussed:

On Wed, 2010-11-24 at 15:19 +0000, Simon Riggs wrote: 
> On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
> > When recovery starts, we fetch the oldestActiveXid from the checkpoint
> > record. Let's say that it's 100. We then start replaying WAL records 
> > from the Redo pointer, and the first record (heap insert in your case)
> > contains an Xid that's much larger than 100, say 10000. We call 
> > RecordKnownAssignedXids() to make note that all xids between that
> > range are in-progress, but there isn't enough room in the array for
> > that.
> 
> Agreed.

The current code fails because of the gap between the redo pointer and
the XLOG_RUNNING_XACTS WAL record. If there is no gap, there is no
problem.

So my preferred solution would:
* Log XLOG_RUNNING_XACTS while holding XidGenLock, as you suggest
* Move logging to occur at the Redo pointer

That is a much smaller patch with a smaller footprint.

-- 
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


In response to

Responses

pgsql-hackers by date

Next:From: Heikki LinnakangasDate: 2010-12-02 11:31:42
Subject: Re: Hot Standby: too many KnownAssignedXids
Previous:From: Heikki LinnakangasDate: 2010-12-02 11:19:20
Subject: Re: WIP patch for parallel pg_dump

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group