RE: [PoC] Non-volatile WAL buffer

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Tomas Vondra' <tomas(dot)vondra(at)enterprisedb(dot)com>, Takashi Menjo <takashi(dot)menjo(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Takashi Menjo <takashi(dot)menjou(dot)vg(at)hco(dot)ntt(dot)co(dot)jp>, "Deng, Gang" <gang(dot)deng(at)intel(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [PoC] Non-volatile WAL buffer
Date: 2020-11-24 06:34:09
Message-ID: TYAPR01MB2990BA17E8C3DA259B327EE3FEFB0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
> So I wonder if using PMEM for the WAL buffer is the right way forward.
> AFAIK the WAL buffer is quite concurrent (multiple clients writing
> data), which seems to contradict the PMEM vs. DRAM trade-offs.
>
> The design I've originally expected would look more like this
>
> clients -> wal buffers (DRAM) -> wal segments (PMEM DAX)
>
> i.e. mostly what we have now, but instead of writing the WAL segments
> "the usual way" we'd write them using mmap/memcpy, without fsync.
>
> I suppose that's what Heikki meant too, but I'm not sure.

SQL Server probably does so. Please see the following page and the links in "Next steps" section. I'm saying "probably" because the document doesn't clearly state whether SQL Server memcpys data from DRAM log cache to non-volatile log cache only for transaction commits or for all log cache writes. I presume the former.

Add persisted log buffer to a database
https://docs.microsoft.com/en-us/sql/relational-databases/databases/add-persisted-log-buffer?view=sql-server-ver15
--------------------------------------------------
With non-volatile, tail of the log storage the pattern is

memcpy to LC
memcpy to NV LC
Set status
Return control to caller (commit is now valid)
...

With this new functionality, we use a region of memory which is mapped to a file on a DAX volume to hold that buffer. Since the memory hosted by the DAX volume is already persistent, we have no need to perform a separate flush, and can immediately continue with processing the next operation. Data is flushed from this buffer to more traditional storage in the background.
--------------------------------------------------

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-11-24 07:35:14 Re: walsender bug: stuck during shutdown
Previous Message Li Japin 2020-11-24 06:23:18 Re: Use macros for calculating LWLock offset