Re: [9.3 bug] disk space in pg_xlog increases during archive recovery

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [9.3 bug] disk space in pg_xlog increases during archive recovery
Date: 2014-02-01 20:44:14
Message-ID: 20140201204414.GA5930@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-01-21 23:37:43 +0200, Heikki Linnakangas wrote:
> On 01/21/2014 07:31 PM, Fujii Masao wrote:
> >On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307(at)gmail(dot)com> wrote:
> >>From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
> >>
> >>>! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
> >>>
> >>>Even when standby_mode is not enabled, we can use cascade replication and
> >>>it needs the accumulated WAL files. So I think that
> >>>AllowCascadeReplication()
> >>>should be added into this condition.
> >>>
> >>>! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
> >>>! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
> >>>!
> >>>! if (restoredFromArchive)
> >>>
> >>>Don't we need to check !StandbyModeRequested and
> >>>!AllowCascadeReplication()
> >>>here?
> >>
> >>Oh, you are correct. Okay, done.
> >
> >Thanks! The patch looks good to me. Attached is the updated version of
> >the patch. I added the comments.
>
> Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
> quite useful property that all the WAL files that are needed for recovery
> are copied into pg_xlog, even when restoring from archive, even when not
> doing cascading replication. It guarantees that you can restart the standby,
> even if the connection to the archive is lost for some reason. I
> intentionally changed the behavior for archive recovery too, when it was
> introduced for cascading replication. Also, I think it's good that the
> behavior does not depend on whether cascading replication is enabled - it's
> a quite subtle difference.
>
> So, IMHO this is not a bug, it's a feature.

Very much seconded. With the old behaviour it's possible to get into the
situation that you cannot get your standby productive anymore if the
archive server died. Since the archive server is often the primary
that's imo unacceptable.

I'd even go as far as saying the previous behaviour is a bug.

> To solve the original problem of running out of disk space in archive
> recovery, I wonder if we should perform restartpoints more aggressively. We
> intentionally don't trigger restatpoings by checkpoint_segments, only
> checkpoint_timeout, but I wonder if there should be an option for that.
> MauMau, did you try simply reducing checkpoint_timeout, while doing
> recovery?

Hm, don't we actually do cause trigger restartpoints based on checkpoint
segments?

static int
XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
{
...

if (readFile >= 0 && !XLByteInSeg(targetPagePtr, readSegNo))
{
/*
* Request a restartpoint if we've replayed too much xlog since the
* last one.
*/
if (StandbyModeRequested && bgwriterLaunched)
{
if (XLogCheckpointNeeded(readSegNo))
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(readSegNo))
RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
}
}
...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-02-01 20:49:17 Re: [9.3 bug] disk space in pg_xlog increases during archive recovery
Previous Message Andrew Dunstan 2014-02-01 20:22:45 Re: install libpq.dll in bin directory on Windows / Cygwin