On Thu, 2010-12-02 at 10:39 +0200, Heikki Linnakangas wrote:
> On 01.12.2010 20:51, Heikki Linnakangas wrote:
> > Another approach would be to revisit the way the running-xacts snapshot
> > is taken. Currently, we first take a snapshot, and then WAL-log it.
> > There is a small window between the steps where backends can begin/end
> > transactions, and recovery has to deal with that. When this was
> > designed, there was long discussion on whether we should instead grab
> > WALInsertLock and ProcArrayLock at the same time, to ensure that the
> > running-xacts snapshot represents an up-to-date situation at the point
> > in WAL where it's inserted.
> > We didn't want to do that because both locks can be heavily contended.
> > But maybe we should after all. It would make the recovery code simpler.
> > If we want to get fancy, we wouldn't necessarily need to hold both locks
> > for the whole duration. We could first grab ProcArrayLock and construct
> > the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
> > finally write the WAL record and release WALInsertLock. But that would
> > require small changes to XLogInsert.
> I took a look at that approach. We don't actually need to hold
> ProcArrayLock while the WAL-record is written, we need to hold
> XidGenLock. I believe that's less severe than holding the ProcArrayLock
> as there's already precedence for writing a WAL record while holding
> that: we do that when we advance to a new clog page and write a
> zero-clog-page record.
> So this is what we should do IMHO.
Oh, thanks for looking at this. I've been looking at this also and as we
might expect had a slightly different design.
First, your assessment of the locking above is better than mine. I agree
with your analysis so we should do it that way. The locking issue was
the reason I haven't patched this yet so I'm glad you've improved this.
In terms of the rest of the patch, it seems we have different designs, I
think I have a much simpler, less invasive solution:
The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the "grab the lock"
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
In response to
pgsql-hackers by date
|Next:||From: aaliya zarrin||Date: 2010-12-02 09:55:04|
|Subject: Re: Hi- How frequently Postgres Poll for trigger file|
|Previous:||From: Heikki Linnakangas||Date: 2010-12-02 08:39:31|
|Subject: Re: Hot Standby: too many KnownAssignedXids|