Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-03-29 20:09:22
Message-ID: CANP8+jJNCaNWtK2rfRKuq_QLRvGex87L7F4xOYWFr9KOfD_iPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 March 2018 at 19:02, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>>> Also, I will say it once more: this change DOES decrease robustness.
>>> It's like blockchain without the chain aspect, or git commits without
>>> a parent pointer. We are not only interested in whether individual
>>> WAL records are valid, but whether they form a consistent series.
>>> Cross-checking xl_prev provides some measure of confidence about that;
>>> xl_curr offers none.
>>
>> Not sure.
>>
>> If each WAL record has xl_curr, then we know to which position the
>> record belongs (after verifying the checksum). And we do know the size
>> of each WAL record, so we should be able to deduce if two records are
>> immediately after each other. Which I think is enough to rebuild the
>> chain of WAL records.
>>
>> To defeat this, this would need to happen:
>>
>> a) the WAL record gets written to a different location
>> b) the xl_curr gets corrupted in sync with (a)
>> c) the WAL checksum gets corrupted in sync with (b)
>> d) the record overwrites existing record (same size/boundaries)
>>
>> That seems very much like xl_prev.
>
> I don't think so, because this ignores, for example, timeline
> switches, or multiple clusters accidentally sharing an archive
> directory.

I'm not hearing any actual technical problems.

> Given where we are
> in the release cycle, I'd prefer to defer that question to next cycle.

Accepted, on the basis of risk.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-03-29 20:11:04 Re: [HACKERS] why not parallel seq scan for slow functions
Previous Message Tomas Vondra 2018-03-29 20:03:01 Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()