Re: Hot standby, recovery infra

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, recovery infra
Date: 2009-02-05 19:54:50
Message-ID: 498B440A.1030101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ok, here's another version. Major changes since last patch:

- Startup checkpoint is now again performed after the recovery is
finished, before allowing (read-write) connections. This is because we
couldn't solve the problem of re-entering recovery after a crash before
the first online checkpoint.

- minSafeStartPoint is gone, and its functionality has been folded into
minRecoveryPoint. It was really the same semantics. There might have
been some debugging value in keeping the backup stop time around, but
it's in the backup label file in the base backup anyway.

- minRecoveryPoint is now updated in XLogFlush, instead of when a file
is restored from archive.

- log_restartpoints is gone. Use log_checkpoints in postgresql.conf
instead

Outstanding issues:

- If bgwriter is performing a restartpoint when recovery ends, the
startup checkpoint will be queued up behind the restartpoint. And since
it uses the same smoothing logic as checkpoints, it can take quite some
time for that to finish. The original patch had some code to hurry up
the restartpoint by signaling the bgwriter if
LWLockConditionalAcquire(CheckPointLock) fails, but there's a race
condition with that if a restartpoint starts right after that check. We
could let the bgwriter do the checkpoint too, and wait for it, but
bgwriter might not be running yet, and we'd have to allow bgwriter to
write WAL while disallowing it for all other processes, which seems
quite complex. Seems like we need something like the
LWLockConditionalAcquire approach, but built into CreateCheckPoint to
eliminate the race condition

- If you perform a fast shutdown while startup process is waiting for
the restore command, startup process sometimes throws a FATAL error
which leads escalates into an immediate shutdown. That leads to
different messages in the logs, and skipping of the shutdown
restartpoint that we now otherwise perform.

- It's not clear to me if the rest of the xlog flushing related
functions, XLogBackgroundFlush, XLogNeedsFlush and XLogAsyncCommitFlush,
need to work during recovery, and what they should do.

I'll continue working on those outstanding items.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
recovery-infra-41d3bcb.patch text/x-diff 63.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-02-05 20:16:09 Re: Hot standby, recovery infra
Previous Message Tom Lane 2009-02-05 17:37:13 Re: Fixing Grittner's planner issues