Skip site navigation (1) Skip section navigation (2)

Re: Hot Standby: too many KnownAssignedXids

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby: too many KnownAssignedXids
Date: 2010-12-01 18:51:19
Message-ID: 4CF69927.2070202@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 24.11.2010 12:48, Heikki Linnakangas wrote:
> When recovery starts, we fetch the oldestActiveXid from the checkpoint
> record. Let's say that it's 100. We then start replaying WAL records
> from the Redo pointer, and the first record (heap insert in your case)
> contains an Xid that's much larger than 100, say 10000. We call
> RecordKnownAssignedXids() to make note that all xids between that range
> are in-progress, but there isn't enough room in the array for that.
>
> We normally get away with a smallish array because the array is trimmed
> at commit and abort records, and the special xid-assignment record to
> handle the case of a lot of subtransactions. We initialize the array
> from the running-xacts record that's written at a checkpoint. That
> mechanism fails in this case because the heap insert record is seen
> before the running-xacts record, causing all those xids in the range
> 100-10000 to be considered in-progress. The running-xacts record that
> comes later would prune them, but we don't have enough slots to hold
> them until that.
>
> Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
> the xids we see during WAL replay in private memory instead of putting
> them in the KnownAssignedXids array until we see the running-xacts record.

So, here's a patch using that approach.

Another approach would be to revisit the way the running-xacts snapshot 
is taken. Currently, we first take a snapshot, and then WAL-log it. 
There is a small window between the steps where backends can begin/end 
transactions, and recovery has to deal with that. When this was 
designed, there was long discussion on whether we should instead grab 
WALInsertLock and ProcArrayLock at the same time, to ensure that the 
running-xacts snapshot represents an up-to-date situation at the point 
in WAL where it's inserted.

We didn't want to do that because both locks can be heavily contended. 
But maybe we should after all. It would make the recovery code simpler.

If we want to get fancy, we wouldn't necessarily need to hold both locks 
for the whole duration. We could first grab ProcArrayLock and construct 
the snapshot. Then grab WALInsertLock and release ProcArrayLock, and 
finally write the WAL record and release WALInsertLock. But that would 
require small changes to XLogInsert.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: knownassignedxids-fix-1.patch
Description: text/x-diff (10.8 KB)

In response to

Responses

pgsql-hackers by date

Next:From: David FetterDate: 2010-12-01 18:51:56
Subject: Re: [HACKERS] Improved JDBC driver part 2
Previous:From: Andres FreundDate: 2010-12-01 18:41:46
Subject: Re: We really ought to do something about O_DIRECT and data=journalled on ext4

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group