Re: Hot Backup with rsync fails at pg_clog if under load

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-10-26 11:26:54
Message-ID: E89DA5E7-6905-4800-9C7D-11B48FC42E02@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct25, 2011, at 13:39 , Florian Pflug wrote:
> On Oct25, 2011, at 11:13 , Simon Riggs wrote:
>> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> We are starting recovery at the right place but we are initialising
>>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
>>> being derived later than it should be, which can cause problems if
>>> this then means that whole pages are unitialised in subtrans. The bug
>>> only shows up if you do enough transactions (2048 is always enough) to
>>> move to the next subtrans page between the redo pointer and the
>>> checkpoint record while at the same time we do not have a long running
>>> transaction that spans those two points. That's just enough to happen
>>> reasonably frequently on busy systems and yet just enough to have
>>> slipped through testing.
>>>
>>> We must either
>>>
>>> 1. During CreateCheckpoint() we should derive oldestActiveXid before
>>> we derive the redo location
>
>> (1) looks the best way forwards in all cases.
>
> Let me see if I understand this
>
> The probem seems to be that we currently derive oldestActiveXid end the end of
> the checkpoint, just before writing the checkpoint record. Since we use
> oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
> that checkpoint record (but after the REDO location, of course) may very well
> contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
> to touch these XID's SUBTRANS state, we error out.
>
> Your patch seems sensible, because the checkpoint "logically" occurs at the
> REDO location not the checkpoint's location, so we ought to log an oldestActiveXID
> corresponding to that location.

Thinking about this some more (and tracing through the code), I realized that
things are a bit more complicated.

What we actually need to ensure, I think, is that the XID we pass to StartupSUBTRANS()
is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at first
glance, implies that we ought to use the nextId at the *beginning* of the checkpoint
for SUBTRANS initialization. At second glace, however, that'd be wrong, because
backends emit XLOG_XACT_ASSIGNMENT only every PGPROC_MAX_CACHED_SUBXIDS sub-xid
assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has started
may contain sub-XIDs which were assigned *before* the checkpoint has started.

Using oldestActiveXID works around that because we guarantee that sub-XIDs are always
larger than their parent XIDs and because only active transactions can produce
XLOG_XACT_ASSIGNMENT records.

So your patch is fine, but I think the reasoning about why oldestActiveXID is
the correct value for StartupSUBTRANS deserves an explanation somewhere.

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2011-10-26 11:31:19 Re: patch for distinguishing PG instances in event log v2
Previous Message Florian Pflug 2011-10-26 11:16:51 Re: Hot Backup with rsync fails at pg_clog if under load