Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-01-13 15:40:01
Message-ID: CANP8+jJ1finWKnc7Q_i9K-nnT2KsSmKXfM5mvks2rXJr+dX4Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 January 2018 at 15:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>> I have some reservations about whether this makes the mechanism less
>> reliable.
>
> Yeah, it scares me too. The xl_prev field is our only way of detecting
> that we're looking at old WAL data when we cross a sector boundary.
> I have no faith that we can prevent old WAL data from reappearing in the
> file system across an OS crash, so I find Simon's assertion that we can
> dodge the problem through file manipulation to be simply unbelievable.

Not really sure what you mean by "file manipulation". Maybe the
proposal wasn't clear.

We need a way of detecting that we are looking at old WAL data. More
specifically, we need to know whether we are looking at a current file
or an older file. My main assertion here is that the detection only
needs to happen at file-level, not at record level, so it is OK to
lose some bits of information without changing our ability to protect
data - they were not being used productively.

Let's do the math to see if it is believable, or not.

The new two byte value is protected by CRC. The 2 byte value repeats
every 32768 WAL files. Any bit error in that value that made it appear
to be a current value would need to have a rare set of circumstances.

1. We would need to suffer a bit error that did not get caught by the CRC.

2. An old WAL record would need to occur right on the boundary of the
last WAL record.

3. The bit error would need to occur within the 2 byte value. WAL
records are usually fairly long, but so this has a Probability of
<1/16

4. The bit error would need to change an old value to the current
value of the new 2 byte field. If the current value is N, and the
previous value is M, then a single bit error that takes M -> N can
only happen if N-M is divisible by 2. The maximum probability of an
issue would occur when we reuse WAL every 3 files, so probability of
such a change would be 1/16. If the distance between M and N is not a
power of two then a single bit error cannot change M into N. So what
probability do we assign to the situation that M and N are exactly a
power of two apart?

So the probability of this occurring requires a single undetectable
bit error and would then happen less than 1 in 256 times, but arguably
much less. Notice that this probability is therefore at least 2 orders
of magnitude smaller than the chance that a single bit error occurs
and simply corrupts data, a mere rounding error in risk.

I don't find that unbelievable at all.

If you still do, then I would borrow Andres' idea of using the page
header. If we copy the new 2 byte value into the page header, we can
use that to match against in the case of error. XLogPageHeaderData can
be extended by 2 bytes without increasing its size when using 8 byte
alignment. The new 2 byte value is the same anywhere in the file, so
that works quickly and easily. And it doesn't increase the size of the
header.

So with that change it looks completely viable.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-01-13 15:43:02 Re: [HACKERS] Replication status in logical replication
Previous Message Arthur Zakirov 2018-01-13 15:22:41 Re: [PROPOSAL] Shared Ispell dictionaries