Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Date: 2016-04-04 06:22:22
Message-ID: CAB7nPqTkoUqhmRD2UySeWUh+WumGhmRCTLcD8Eknxe9vv-y0nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Apr 4, 2016 at 6:41 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I'm not very excited about this patch. Too much code for so little benefit
> and fragile too.
>
> I'm not even sure what definition of "meaningful progress" is useful. If we
> commit this, a similar bug could be filed for many similar WAL record types.

Up to now, "progress" means the addition of WAL records by backends,
"meaningful" refers to records that impact the decision of skipping
checkpoints or not.

> Since we zero out WAL files specifically to allow them to be compressed in
> the archive, this patch saves a few bytes in the archive. Seems like we
> could fake up some WAL files if needed for restore with an external tool, if
> we really care.

This assumes that most deployments compress the segments when
archiving them for simplicity. I would believe that the reverse is
true, most of archive command processes do not compress them. And I am
pretty sure that some cloud deployments have as well pricing models
depending on the number of accesses done on an instance.

> Since we agree it wouldn't be backpatched, its pretty much saying we don't
> care.

Seeing the flow of this thread, that's not completely true. Andres has
been arguing for no backpatch, Amit and I not, because the checkpoint
skip logic is incorrect. The logic to decide is a checkpoint should be
skipped or not is still broken on back-branches for wal_level =
hot_standby. That's related to the former complain reported here.

> A promising approach would be to just skip logging the RUNNING_XACTS record
> if there are no xacts running, which is the case we care about here (no
> progress => no xacts). HS startup can only happen at a checkpoint, so if
> there is no WAL file there is no checkpoint, so we don't need the
> RUNNING_XACTS record to be written. That's a short patch.

I am not sure if that's actually simpler... Do you mean here that we
should save the latest transaction ID at the moment a snapshot is
taken and skip the next snapshot if TXID has not progressed? Say in
LogStandbySnapshot() when a snapshot is taken, we allocate the last
XID with ReadNewTransactionId() in XlogCtlData, then compare it the
next time a standby snapshot is taken. With such a logic, you need to
do something like that:
- Save the last transaction XID when RUNNING_XACTS has been logged in shmem
- Save in shmem as well if the previous RUNNING_XACTS has overflowed
its subxids or not. If it has overflowed, we should log RUNNING_XACTS
next time LogStandbySnapshot is logged to prevent the case of pending
snapshots at recovery.

AFAIK, standby snapshots with no xacts running are still useful at
recovery for two things to accelerate a hot standby initialization:
- If the previous snapshot overflowed its subxids, the next empty
snapshot can help out clear the situation. That's the case where the
bgwriter snapshot is helpful actually, because there is no need to
wait for the next checkpoint record to be replayed to initialize a hot
standby.
- An empty snapshot after checkpoint is mandatory, even if it is
empty, no? Knowing that, the next checkpoint cannot be skipped because
the logic mentioned above in xlog.c to decide if a checkpoint should
be skipped or not ignores the fact that a standby snapshot has been
logged. It bases its logic on the sole presence of a checkpoint
record, so this will never work. I mean this check of course:
if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
CHECKPOINT_FORCE)) == 0)
{
if (prevPtr == ControlFile->checkPointCopy.redo &&
prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
{
WALInsertLockRelease();
LWLockRelease(CheckpointLock);
END_CRIT_SECTION();
return;
}
}
The approach introducing the concept of WAL progress addresses the
problem of unnecessary checkpoints and of unnecessary standby
snapshots. If we take the problem only from the angle of RUNNING_XACTS
the current logic used to check if a checkpoint should be skipped or
not is not addressed.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Schuch, Mathias (Mathias) 2016-04-04 08:05:34 Re: BUG #14050: "could not reserve shared memory region" in postgresql log
Previous Message ondrej 2016-04-03 22:08:15 BUG #14061: Cannot update RPM from PGDG - unsigned package

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-04-04 06:24:50 Re: pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server
Previous Message Andres Freund 2016-04-04 06:21:59 Re: pgbench more operators & functions