Re: [PoC] Non-volatile WAL buffer

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Takashi Menjo <takashi(dot)menjo(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Takashi Menjo <takashi(dot)menjou(dot)vg(at)hco(dot)ntt(dot)co(dot)jp>, "Deng, Gang" <gang(dot)deng(at)intel(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Non-volatile WAL buffer
Date: 2021-03-05 17:16:37
Message-ID: 572b5e28-ecec-eef8-bcfd-dd0e31b7e26e@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Takashi-san,

On 3/5/21 9:08 AM, Takashi Menjo wrote:
> Hi Tomas,
>
> Thank you so much for your report. I have read it with great interest.
>
> Your conclusion sounds reasonable to me. My patchset you call "NTT /
> segments" got as good performance as "NTT / buffer" patchset. I have
> been worried that calling mmap/munmap for each WAL segment file could
> have a lot of overhead. Based on your performance tests, however, the
> overhead looks less than what I thought. In addition, "NTT / segments"
> patchset is more compatible to the current PG and more friendly to
> DBAs because that patchset uses WAL segment files and does not
> introduce any other new WAL-related file.
>

I agree. I was actually a bit surprised it performs this well, mostly in
line with the "NTT / buffer" patchset. I've seen significant issues with
our simple experimental patches, which however went away with larger WAL
segments. But the "NTT / segments" patch does not have that issue, so
either our patches were doing something wrong, or perhaps there was some
other issue (not sure why larger WAL segments would improve that).

Do these results match your benchmarks? Or are you seeing significantly
different behavior?

Do you have any thoughts regarding the impact of full-page writes? So
far all the benchmarks we did focused on small OLTP transactions on data
sets that fit into RAM. The assumption was that that's the workload that
would benefit from this, but maybe that's missing something important
about workloads producing much larger WAL records? Say, workloads
working with large BLOBs, bulk loads etc.

The other question is whether simply placing WAL on DAX (without any
code changes) is safe. If it's not, then all the "speedups" are computed
with respect to unsafe configuration and so are useless. And BTT should
be used instead, which would of course produce very different results.

> I also think that supporting both file I/O and mmap is better than
> supporting only mmap. I will continue my work on "NTT / segments"
> patchset to support both ways.
>

+1

> In the following, I will answer "Issues & Questions" you reported.
>
>
>> While testing the "NTT / segments" patch, I repeatedly managed to crash the cluster with errors like this:
>>
>> 2021-02-28 00:07:21.221 PST client backend [3737139] WARNING: creating logfile segment just before
>> mapping; path "pg_wal/00000001000000070000002F"
>> 2021-02-28 00:07:21.670 PST client backend [3737142] WARNING: creating logfile segment just before
>> mapping; path "pg_wal/000000010000000700000030"
>> ...
>> 2021-02-28 00:07:21.698 PST client backend [3737145] WARNING: creating logfile segment just before
>> mapping; path "pg_wal/000000010000000700000030"
>> 2021-02-28 00:07:21.698 PST client backend [3737130] PANIC: could not open file
>> "pg_wal/000000010000000700000030": No such file or directory
>>
>> I do believe this is a thinko in the 0008 patch, which does XLogFileInit in XLogFileMap. Notice there are multiple
>> "creating logfile" messages with the ..0030 segment, followed by the failure. AFAICS the XLogFileMap may be
>> called from multiple backends, so they may call XLogFileInit concurrently, likely triggering some sort of race
>> condition. It's fairly rare issue, though - I've only seen it twice from ~20 runs.
>
> Thank you for your report. I found that rather the patch 0009 has an
> issue, and that will also cause WAL loss. I should have set
> use_existent to true, or InstallXlogFileSegment and BasicOpenFile in
> XLogFileInit can be racy. I have misunderstood that use_existent can
> be true because I am creating a brand-new file with XLogFileInit.
>
> I will fix the issue.
>

OK, thanks for looking into this.

>
>> The other question I have is about WALInsertLockUpdateInsertingAt. 0003 removes this function, but leaves
>> behind some of the other bits working with insert locks and insertingAt. But it does not explain how it works without
>> WaitXLogInsertionsToFinish() - how does it ensure that when we commit something, all the preceding WAL is
>> "complete" (i.e. written by other backends etc.)?
>
> To wait for *all* the WALInsertLocks to be released, no matter each of
> them precedes or follows the current insertion.
>
> It would have worked functionally, but I rethink it is not good for
> performance because XLogFileMap in GetXLogBuffer (where
> WaitXLogInsertionsToFinish is removed) can block because it can
> eventually call write() in XLogFileInit.
>
> I will restore the WALInsertLockUpdateInsertingAt function and related
> code for mmap.
>

OK. I'm still not entirely sure I understand if the current version is
correct, but I'll wait for the reworked version.

kind regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-03-05 17:36:55 Re: Nicer error when connecting to standby with hot_standby=off
Previous Message Julien Rouhaud 2021-03-05 17:13:28 Re: n_mod_since_analyze isn't reset at table truncation