Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-04-03 13:33:54
Message-ID: CA+TgmoYafrYf0ZrFLUMh4nyZarMzZHzQtwzRTMbDK81fgEWBYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 29, 2018 at 5:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> If each WAL record has xl_curr, then we know to which position the
>> record belongs (after verifying the checksum). And we do know the size
>> of each WAL record, so we should be able to deduce if two records are
>> immediately after each other.
>
> Per my point earlier, XLOG_SWITCH is sufficient to defeat that argument.
> Also consider a timeline fork. It's really hard to be sure which record
> in the old timeline is the direct ancestor of the first one in the new
> if you lack xl_prev:
>
> A1 -> B1 -> C1 -> D1
> \
> B2 -> C2 -> D2
>
> If you happened to get confused and think that C2 is the first in its
> timeline, diverging off the old line after B1 not A1, there would be
> nothing about C2 to disabuse you of your error.

But, as Simon correctly points out, if xl_prev is the only thing
that's saving us from disaster, that's rather fragile. All it's
cross-checking is the length of the previous WAL record, and that
could easily match by accident: there aren't *that* many bits of
entropy in the length of a WAL record. As he also points out, I think
also correctly, if we really want a strong check that the chain of WAL
records is continuous and unbroken, we ought to be including the CRC
from the previous WAL record, not just the length.

Another thing that's bothering me is this: surely there could be (if
there isn't already) something *else* that tells us whether we've
switched timelines at the wrong place. I mean, if nothing else, we
could dictate that the first WAL record after a timeline switch must
be XLOG_TIMELINE_SWITCH or something like that, and then if you try to
change timelines and don't find that record (with the correct previous
TLI), you know something's messed up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-04-03 13:34:45 Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Previous Message Claudio Freire 2018-04-03 13:19:10 Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently