Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-04-03 07:23:04
Message-ID: d37a0436-671b-ae7f-4e5f-501912102075@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27/03/18 21:02, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Mar 27, 2018 at 11:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> This is ignoring the possibility of damaged data in between, ie
>>> A ... B ... CHKPT ... C ... a few zeroed pages ... D ... CHKPT ... E ... F
>
>> It's hard for me to believe that this case matters very much. If
>> you're trying to run pg_rewind on a system where the WAL segments
>> contain a few zeroed pages, you're probably going to be hosed anyway,
>> if not by this particular thing then by something else.
>
> Well, the point of checkpoints is that WAL data before the last one
> should no longer matter anymore, isn't it?
>
> But really this is just one illustration of the point, which is that
> changing the WAL header definition as proposed *does* cost us reliability.
> We can argue about whether better concurrency in WAL insertion is
> worth that price, but claiming that the price is zero is flat wrong.
>
> For me, the more important issue is that checking xl_prev not only
> validates that the record is where it is supposed to be, but that you
> arrived at it correctly. Replacing it with xl_curr would keep the
> guarantee of the former (in fact likely make it stronger); but we'd
> completely lose the latter.

Yeah, removing xl_prev would certainly remove a useful cross-check from
the format. Maybe it would be worth it, I don't think it's that
critical. And adding some extra information in the page or segment
headers might alleviate that.

But let's go back to why we're considering this. The idea was to
optimize this block:

> /*
> * The duration the spinlock needs to be held is minimized by minimizing
> * the calculations that have to be done while holding the lock. The
> * current tip of reserved WAL is kept in CurrBytePos, as a byte position
> * that only counts "usable" bytes in WAL, that is, it excludes all WAL
> * page headers. The mapping between "usable" byte positions and physical
> * positions (XLogRecPtrs) can be done outside the locked region, and
> * because the usable byte position doesn't include any headers, reserving
> * X bytes from WAL is almost as simple as "CurrBytePos += X".
> */
> SpinLockAcquire(&Insert->insertpos_lck);
>
> startbytepos = Insert->CurrBytePos;
> endbytepos = startbytepos + size;
> prevbytepos = Insert->PrevBytePos;
> Insert->CurrBytePos = endbytepos;
> Insert->PrevBytePos = startbytepos;
>
> SpinLockRelease(&Insert->insertpos_lck);

One trick that we could do is to replace that with a 128-bit atomic
compare-and-swap instruction. Modern 64-bit Intel systems have that,
it's called CMPXCHG16B. Don't know about other architectures. An atomic
fetch-and-add, as envisioned in the comment above, would presumably be
better, but I suspect that a compare-and-swap would be good enough to
move the bottleneck elsewhere again.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Satoshi Nagayasu 2018-04-03 09:07:02 Missing parse_merge.h?
Previous Message Andrew Dunstan 2018-04-03 06:14:37 Re: ALTER TABLE ADD COLUMN fast default