Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-03-27 14:51:35
Message-ID: CABOikdNsH=bNecE7=hYdGcGTTUvGoWUcD_V1cuRGdr5=GOBNNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 7:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
>
> I had not noticed that in the kerfuffle over the more extreme version,
> but I think this is a different route to breaking the same guarantees.
> The point of the xl_prev links is, essentially, to allow verification
> that we are reading a coherent series of WAL records, ie that record B
> which appears to follow record A actually is supposed to follow it.
> This throws away that cross-check just as effectively as the other patch
> did, leaving us reliant solely on the per-record CRCs. CRCs are good,
> but they do have a false-positive rate.
>
> An example of a case where xl_prev makes one feel a lot more comfortable
> is the WAL-segment-switch option. The fact that the first record of the
> next WAL file links back to the XLOG_SWITCH record is evidence that
> ignoring multiple megabytes of WAL was indeed the intended thing to do.
> An xl_curr field cannot provide any evidence as to whether WAL records
> were missed.
>

TBH I still don't see why this does not provide the same guarantee that the
current code provides, but given the concerns expressed by others, I am not
gonna pursue beyond a point. But one last time :-)

The current code uses xl_prev to cross-verify the record B, read after
record A, indeed follows A and has a valid back-link to A. This deals with
problems where B might actually be an old WAL record, carried over from a
stale WAL file.

Now if we store xl_curr, we can definitely guarantee that B is ahead of A
because B->xl_curr will be greater than A->xl_curr (otherwise we bail out).
So that deals with the problem of stale WAL records. In addition, we also
know where A ends (we can deduce that even for XLOG_SWITCH records knowing
where the next record will start after the switch) and hence we know where
B should start. So we read at B and also confirm that B->xl_curr matches
B's position. If it does not, we declare end-of-WAL and bail out. So where
is the problem?

> > 2. Does the new logic in pg_rewind to search backward for a checkpoint
> > work reliably, and will it be slow?
>
> If you have to search backwards, this breaks it. Full stop.
>

We don't really need to fetch the previous record. We really need to find
the last checkpoint prior to a given LSN. That can be done by reading WAL
segments forward. It can be a little slow, but hopefully not a whole lot.

A <- B <- C <- CHKPT <- D <- E <- F <- G

So today, if we want to find last checkpoint prio to G, we go through the
back-links until we find the first checkpoint record. In the proposed code,
we read forward the current WAL segment, remember the last CHKPT record
seen and once we see G, we know we have found the prior checkpoint. If the
checkpoint does not exist in the current WAL, we read forward the previous
WAL and return the last checkpoint record in that WAL and so on. So in the
worst case, we might read a WAL segment extra before we find the checkpoint
record. That's not ideal but not too bad given that only pg_rewind needs
this and that too only once.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2018-03-27 14:53:37 Re: Re: Cast jsonb to numeric, int, float, bool
Previous Message Marina Polyakova 2018-03-27 14:49:02 Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors