Re: Restartable Recovery

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Restartable Recovery
Date: 2006-07-16 15:57:55
Message-ID: 1153065476.2654.247.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Sun, 2006-07-16 at 10:51 -0400, Tom Lane wrote:
> Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org> writes:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> >> [2. text/x-patch; restartableRecovery.patch]
>
> > Hmm, wouldn't you have to reboot the resource managers at each
> > checkpoint? I'm afraid otherwise things like postponed page splits
> > could get lost on restart from a later checkpoint.
>
> Ouch. That's a bit nasty. You can't just apply a postponed split at
> checkpoint time, because the WAL record could easily be somewhere after
> the checkpoint, leading to duplicate insertions. Right offhand I don't
> see how to make this work :-(

Yes, ouch. So much for gung-ho code sprints; thanks Andreas.

To do this we would need to have another rmgr specific routine that gets
called at a recovery checkpoint. This would then write to disk the
current state of the incomplete multi-WAL actions, in some manner.
During the startup routines we would check for any pre-existing state
files and use those to initialise the incomplete action cache. Cleanup
would then discard all state files.

That allows us to not-forget actions, but it doesn't help us if there
are problems repeating actions twice. We would at least know that we are
in a potential double-action zone and could give different kinds of
errors or handling.

Or we can simply mark any indexes incomplete-needs-rebuild if they had a
page split during the overlap time between the last known good recovery
checkpoint and the following one. But that does lead to randomly bounded
recovery time, which might be better to have started from scratch
anyway.

Given time available for 8.2, neither one is a quick fix.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2006-07-16 16:12:19 Re: Windows buildfarm support, or lack of it
Previous Message Robert Treat 2006-07-16 15:48:17 Re: Online index builds

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-07-16 16:24:27 Possible explanation for Win32 stats regression test failures
Previous Message Tom Lane 2006-07-16 14:51:47 Re: Restartable Recovery