Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-03-30 09:57:48
Message-ID: CANP8+jLibcKeQp3k3Kg5EL9oyqH-_EZP8TTXS9n=NvtfAhDfMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 March 2018 at 23:16, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>
> On 03/29/2018 11:18 PM, Tom Lane wrote:
>> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>> If each WAL record has xl_curr, then we know to which position the
>>> record belongs (after verifying the checksum). And we do know the size
>>> of each WAL record, so we should be able to deduce if two records are
>>> immediately after each other.
>>
>> Per my point earlier, XLOG_SWITCH is sufficient to defeat that argument.
>
> But the SWITCH record will be the last record in the WAL segment (and if
> there happens to be a WAL record after it, it will have invalid xl_curr
> pointer). And the next valid record will be the first one in the next
> WAL segment. So why wouldn't that be enough information?
>
>> Also consider a timeline fork. It's really hard to be sure which record
>> in the old timeline is the direct ancestor of the first one in the new
>> if you lack xl_prev:
>>
>> A1 -> B1 -> C1 -> D1
>> \
>> B2 -> C2 -> D2
>>
>> If you happened to get confused and think that C2 is the first in its
>> timeline, diverging off the old line after B1 not A1, there would be
>> nothing about C2 to disabuse you of your error.
>
> Doesn't that mean the B1 and B2 have to be exactly the same size?
> Otherwise there would be a gap between B1/C2 or B2/C1, and the xl_curr
> would be enough to detect this.
>
> And how could xl_prev detect it? AFAIK XLogRecPtr does not include
> TimeLineID, so xl_prev would be the same for both B1 and B2.
>
> I admit WAL internals are not an are I'm particularly familiar with,
> though, so I may be missing something utterly obvious.

Timeline history files know the LSN at which they fork. I can imagine
losing the timeline branch point metadata because of corruption or
other disaster, but getting the LSN slightly wrong sounds strange. If
it was slightly wrong and yet still a valid LSN, xl_prev provides no
protection about that.

The timeline branch point is already marked by specific records in
WAL, so if you forgot the LSN completely you would search for those
and then you'd know. xl_prev wouldn't assist you in that search.

Agree with points about XLOG_SWITCH, adding again the observation that
they are no longer used for replication and if they are used as well
as replication, have a bad effect on performance. I think it would be
easily possible to add some more detail to the WAL stream if needed.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-03-30 10:26:23 Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.
Previous Message Pavel Stehule 2018-03-30 09:45:16 Re: csv format for psql