Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-02-01 06:00:22
Message-ID: CABOikdNm=Nz49_nE90eYamGEX-ToFVsjTMt3Z6NuKS=qiEGwFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 1, 2018 at 11:05 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Sat, Jan 13, 2018 at 03:40:01PM +0000, Simon Riggs wrote:
> > The new two byte value is protected by CRC. The 2 byte value repeats
> > every 32768 WAL files. Any bit error in that value that made it appear
> > to be a current value would need to have a rare set of circumstances.
>
> If you use the two lower bytes of the segment number, then this gets
> repeated every 64k segments, no? In terms of space this represents
> 500GB worth of WAL segments with a default segment size. Hence the more
> PostgreSQL scales, the more there is a risk of collision, and I am
> pretty sure that there are already deployments around allocating
> hundreds of gigs worth of space for the WAL partition. There are no
> problems of this class if using the 8-byte field xl_prev. It seems to
> me that we don't want to take any risks here.
>

IMHO we're missing a point here. When xl_prev is changed to a 2-byte value
(as the patch suggests), the risk only comes when an old WAL file is
recycled for some future WAL file and the old and the future WAL file's
segment numbers share the same low order 16-bits. Now as the patch stands,
we take precaution to never reuse a WAL file with conflicting low order
bits.

This particular piece of the patch deals with that:

@@ -3496,6 +3495,16 @@ InstallXLogFileSegment(XLogSegNo *segno, char
*tmppath,
return false;
}
(*segno)++;
+
+ /*
+ * If avoid_conflicting_walid is true, then we must not recycle
the
+ * WAL files so that the old and the recycled WAL segnos have
the
+ * same lower order 16-bits.
+ */
+ if (avoid_conflicting_walid &&
+ XLogSegNoToSegID(tmpsegno) == XLogSegNoToSegID(*segno))
+ (*segno)++;
+
XLogFilePath(path, ThisTimeLineID, *segno, wal_segment_size);
}
}

For example, old WAL file 000000010000000000000001 will NEVER be reused
as 000000010000000000010001. So even if there are any intact old WAL
records in the recycled file, they will be detected correctly during replay
since the SegID stored in the WAL record will not match the SegID as
derived from the WAL file segment number. The replay code does this for
every WAL record it sees.

There were some concerns about bit-flipping, which may inadvertently SegID
stored in the carried over WAL record so that it now matches with the
current WAL files' SegID, but TBH I don't see why we can't trust CRC to
detect that. Because if we can't, then there would be other problems during
WAL replay.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-01 06:21:58 Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Previous Message Yugo Nagata 2018-02-01 05:57:40 Re: CURRENT OF causes an error when IndexOnlyScan is used