Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-03-27 13:58:12
Message-ID: 3031.1522159092@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Taking a look at this version, I think the key thing we have to decide
> is whether we're comfortable with this:

> --- a/src/include/access/xlogrecord.h
> +++ b/src/include/access/xlogrecord.h
> @@ -42,7 +42,7 @@ typedef struct XLogRecord
> {
> uint32 xl_tot_len; /* total len of entire record */
> TransactionId xl_xid; /* xact id */
> - XLogRecPtr xl_prev; /* ptr to previous record in log */
> + XLogRecPtr xl_curr; /* ptr to this record in log */
> uint8 xl_info; /* flag bits, see below */
> RmgrId xl_rmid; /* resource manager for this record */
> /* 2 bytes of padding here, initialize to zero */

> I don't see any comments in the patch explaining why this substitution
> is just as safe as what we had before, and I think it has only very
> briefly been commented upon by Pavan, who remarked that it provided
> similar protection to what we have today. That's fair enough, but I
> think a little more analysis of this point would be good.

I had not noticed that in the kerfuffle over the more extreme version,
but I think this is a different route to breaking the same guarantees.
The point of the xl_prev links is, essentially, to allow verification
that we are reading a coherent series of WAL records, ie that record B
which appears to follow record A actually is supposed to follow it.
This throws away that cross-check just as effectively as the other patch
did, leaving us reliant solely on the per-record CRCs. CRCs are good,
but they do have a false-positive rate.

An example of a case where xl_prev makes one feel a lot more comfortable
is the WAL-segment-switch option. The fact that the first record of the
next WAL file links back to the XLOG_SWITCH record is evidence that
ignoring multiple megabytes of WAL was indeed the intended thing to do.
An xl_curr field cannot provide any evidence as to whether WAL records
were missed.

> 2. Does the new logic in pg_rewind to search backward for a checkpoint
> work reliably, and will it be slow?

If you have to search backwards, this breaks it. Full stop.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2018-03-27 13:58:40 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Jonathan S. Katz 2018-03-27 13:51:34 PostgreSQL 11 Release Management Team & Feature Freeze