Re: Extra XLOG in Checkpoint for StandbySnapshot

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: 'Simon Riggs' <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extra XLOG in Checkpoint for StandbySnapshot
Date: 2013-01-09 12:19:19
Message-ID: 20130109121919.GD5759@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-01-09 15:06:04 +0530, Amit Kapila wrote:
> On Wednesday, January 09, 2013 2:28 PM Andres Freund wrote:
> > On 2013-01-09 14:04:32 +0530, Amit Kapila wrote:
> > > On Tuesday, January 08, 2013 8:57 PM Andres Freund wrote:
> > > > On 2013-01-08 20:33:28 +0530, Amit Kapila wrote:
> > > > > On Tuesday, January 08, 2013 8:01 PM Andres Freund wrote:
> > > > > > On 2013-01-08 19:51:39 +0530, Amit Kapila wrote:
> > > > > > > On Monday, January 07, 2013 7:15 PM Andres Freund wrote:
> > > > > > > > On 2013-01-07 19:03:35 +0530, Amit Kapila wrote:
> > > > > > > > > On Monday, January 07, 2013 6:30 PM Simon Riggs wrote:
> > > > > > > > > > On 7 January 2013 12:39, Amit Kapila
> > > > <amit(dot)kapila(at)huawei(dot)com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > >
> > > > > > > > The information that no transactions are currently running
> > > > allows
> > > > > > you
> > > > > > > > to
> > > > > > > > build a recovery snapshot, without that information the
> > standby
> > > > > > won't
> > > > > > > > start answering queries. Now that doesn't matter if all
> > > > standbys
> > > > > > > > already
> > > > > > > > have built a snapshot, but the primary cannot know that.
> > > > > > >
> > > > > > > Can't we make sure that checkpoint operation doesn't happen
> > for
> > > > below
> > > > > > conds.
> > > > > > > a. nothing has happened during or after last checkpoint
> > > > > > > OR
> > > > > > > b. nothing except snapshotstanby WAL has happened
> > > > > > >
> > > > > > > Currently it is done for point a.
> > > > > > >
> > > > > > > > Having to issue a checkpoint while ensuring transactions
> > are
> > > > > > running
> > > > > > > > just to get a standby up doesn't seem like a good idea to
> > me :)
> > > > > > >
> > > > > > > Simon:
> > > > > > > > If you make the correct test, I'd be more inclined to
> > accept
> > > > the
> > > > > > premise.
> > > > > > >
> > > > > > > Not sure, what exact you are expecting from test?
> > > > > > > The test is do any one operation on system and then keep the
> > > > system
> > > > > > idle.
> > > > > > > Now at each checkpoint interval, it logs WAL for
> > SnapshotStandby.
> > > > > >
> > > > > > I can't really follow what you want to do here. The snapshot is
> > > > only
> > > > > > logged if a checkpoint is performed anyway? As recovery starts
> > at
> > > > (the
> > > > > > logical) checkpoint's location we need to log a snapshot
> > exactly
> > > > > > there. If you want to avoid activity when the system is idle
> > you
> > > > need
> > > > > > to
> > > > > > prevent checkpoints from occurring itself.
> > > > >
> > > > > Even if the checkpoint is scheduled, it doesn't perform actual
> > > > operation if
> > > > > there's nothing logged between
> > > > > current and previous checkpoint due to below check in
> > > > CreateCheckPoint()
> > > > > function.
> > > > > if (curInsert == ControlFile->checkPoint +
> > > > > MAXALIGN(SizeOfXLogRecord +
> > > > sizeof(CheckPoint)) &&
> > > > > ControlFile->checkPoint ==
> > > > > ControlFile->checkPointCopy.redo)
> > > > >
> > > > > But if we set the wal_level as hot_standby, it will log snapshot,
> > now
> > > > next
> > > > > time again when function CreateCheckPoint()
> > > > > will get called due to scheduled checkpoint, the above check will
> > > > fail and
> > > > > it will again log snapshot, so this will continue, even if the
> > system
> > > > is
> > > > > totally idle.
> > > > > I understand that it doesn't cause any problem, but I think it is
> > > > better if
> > > > > the repeated log of snapshot in this scenario can be avoided.
> > > >
> > > > ISTM in that case you "just" need a way to cope with the
> > additionally
> > > > logged record in the above piece of code. Not logging seems to be
> > the
> > > > entirely wrong way to go at this.
> > >
> > > I think one of the ways code can be modified is as below:
> > >
> > > + /*size of running transactions log when there is no
> > > active transation*/
> > > + if (!shutdown && XLogStandbyInfoActive())
> > > + {
> > > + runningXactXLog =
> > > MAXALIGN(MinSizeOfXactRunningXacts) + SizeOfXLogRecord;
> > > + }
> > >
> > > ! if (curInsert == ControlFile->checkPoint +
> > > ! MAXALIGN(SizeOfXLogRecord +
> > sizeof(CheckPoint)) &&
> > > ! ControlFile->checkPoint ==
> > > ControlFile->checkPointCopy.redo)
> > >
> > > ! if (curInsert == ControlFile->checkPoint +
> > > ! MAXALIGN(SizeOfXLogRecord +
> > sizeof(CheckPoint)) &&
> > > ! ControlFile->checkPoint ==
> > > ControlFile->checkPointCopy.redo + runningXactXLog)
> > >
> > > Second condition is checking the last checkpoint WAL position with
> > the
> > > current one.
> > > Since ControlFile->checkPointCopy.redo holds the value before
> > "running
> > > Xact" WAL was inserted
> > > and ControlFile->checkPoint holds the value after "running Xact" WAL
> > got
> > > inserted, so if no new WAL was inserted apart from "running Xacts"
> > and
> > > "Checkpoint" WAL, then this condition will be true.
> > >
> >
> > I don't think thats safe, there could have been another record inserted
> > that happens to be MinSizeOfXactRunningXacts big and we would still
> > skip the checkpoint.
>
> I think such can happen only for when first time checkpoint is triggered,
> and even then the first part of the check (curInsert ==
> ControlFile->checkPoint + MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint))
> will fail.
>
> Value to runningXactXLog will be assigned only if wal_level is hot_stanby.
> In that case if checkpoint is getting scheduled for 2nd or consecutive time,
> it will include WAL for "running Xact" along with WAL for any other data.
> So now even if the other data is of size MinSizeOfXactRunningXacts, the
> check should fail and skip the checkpoint.

Hm. The locking around checkpoints probably prevents the case I was
worried about in combination with the wal_level not changing while
running.

> Also why such cannot happen for Checkpoint record, because there is almost
> similar check for Checkpoint record in the same if check?

Because we compare the address of the checkpoint record and count the
size from that. That misses some cases (when wrapping to a new xlog
page) but it won't give false positives.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2013-01-09 12:34:12 Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
Previous Message Simon Riggs 2013-01-09 12:14:34 Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]