Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-03-29 16:26:11
Message-ID: CANP8+j+6n-8+RnTiNJyR1uJ_VReUX4gn6FyKxbroSu27W+kADw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 March 2018 at 01:50, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 27, 2018 at 11:53 PM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> Yeah, we should not do that. The patch surely does not intend to replay any
>> more WAL than what we do today. The idea is to just use a different
>> mechanism to find the prior checkpoint. But we should surely find the latest
>> prior checkpoint. In the rare scenario that Tom showed, we should just throw
>> an error and fix the patch if it's not doing that already.
>
> It's not clear to me that there is any reasonable fix short of giving
> up on this approach altogether. But I might be missing something.

(Sorry to come back to this late)

Yes, a few things have been been missed in this discussion.

Michael's point is invalid. The earlier discussion said that *replay*
of WAL earlier than the last checkpoint could lead to corruption, but
this doesn't mean the records themselves are in some way damaged
structurally, only that their content can lead to inconsistency in the
database. The xl_curr proposal does not replay WAL records it only
searches thru them to locate the last checkpoint.

Tom also said "Well, the point of checkpoints is that WAL data before
the last one should no longer matter anymore, isn't it?". WAL Files
prior to the one containing the last checkpoint can be discarded, but
checkpoints themselves do nothing to invalidate the WAL records before
the checkpoint.

Tom's point that a corruption in the WAL file could prevent pg_rewind
from locating a record is valid, but misses something. If the WAL file
contained corruption AFTER the last checkpoint this would already in
the current code prevent pg_rewind from moving backwards to find the
last checkpoint. So whether you go forwards or backwards it is
possible to be blocked by corrupt WAL records.

Most checkpoints are far enough apart that there is only one
checkpoint record in any WAL file and it could be anywhere in the
file, so the expected number of records to be searched by forwards or
backwards scans is equal. And the probability that corruption occurs
before or after the record is also equal.

Given the above points, I don't see any reason to believe that the
forwards search mechanism proposed is faulty, decreases robustness or
is even slower.

Overall, the loss of links between WAL records is not a loss of
robustness. XLOG_SWITCH is a costly operation and mostly unusable
these days, so shouldn't be a reason to shy away from this patch. We
don't jump around in WAL in any other way, so sequential records are
just fine.

So IMHO the proposed approach is still very much viable.

I know the approach is new and surprising but I thought about it a lot
before proposing it and I couldn't see any holes; still can't. Please
give this some thought so we can get comfortable with this idea and
increase performance as a result. Thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-03-29 16:30:08 Re: SET TRANSACTION in PL/pgSQL
Previous Message Catalin Iacob 2018-03-29 16:20:00 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS