Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, ichiyanagi(dot)yoshimi(at)lab(dot)ntt(dot)co(dot)jp
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, menjo(dot)takashi(at)lab(dot)ntt(dot)co(dot)jp, ishizaki(dot)teruaki(at)lab(dot)ntt(dot)co(dot)jp
Subject: Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Date: 2019-01-23 16:45:42
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/12/2018 23:37, Dmitry Dolgov wrote:
>> On Thu, Nov 29, 2018 at 6:48 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>>> On Tue, Oct 2, 2018 at 4:53 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>> On Mon, Aug 06, 2018 at 06:00:54PM +0900, Yoshimi Ichiyanagi wrote:
>>>> The libpmem's pmem_map_file() supported 2M/1G(the size of huge page)
>>>> alignment, since it could reduce the number of page faults.
>>>> In addition, libpmem's pmem_memcpy_nodrain() is the function
>>>> to copy data using single instruction, multiple data(SIMD) instructions
>>>> and NT store instructions(MOVNT).
>>>> As a result, using these APIs is faster than using old mmap()/memcpy().
>>>> Please see the PGCon2018 presentation[1] for the details.
>>>> [1]
>>> So you say that this represents a 3% gain based on the presentation?
>>> That may be interesting to dig into it. Could you provide fresher
>>> performance numbers? I am moving this patch to the next CF 2018-10 for
>>> now, waiting for input from the author.
>> Unfortunately, the patch has some conflicts now, so probably not only fresher
>> performance numbers are necessary, but also a rebased version.
> I believe the idea behind this patch is quite important (thanks to CMU DG for
> inspiring lectures), so I decided to put some efforts and rebase it to prevent
> from rotting. At the same time I have a vague impression that the patch itself
> suggests quite narrow way of using of PMDK.


To re-iterate what I said earlier in this thread, I think the next step
here is to write a patch that modifies xlog.c to use plain old
mmap()/msync() to memory-map the WAL files, to replace the WAL buffers.
Let's see what the performance of that is, with or without NVM hardware.
I think that might actually make the code simpler. There's a bunch of
really hairy code around locking the WAL buffers, which could be made
simpler if each backend memory-mapped the WAL segment files independently.

One thing to watch out for, is that if you read() a file, and there's an
I/O error, you have a chance to ereport() it. If you try to read from a
memory-mapped file, and there's an I/O error, the process is killed with
SIGBUS. So I think we have to be careful with using memory-mapped I/O
for reading files. But for writing WAL files, it seems like a good fit.

Once we have a reliable mmap()/msync() implementation running, it should
be straightforward to change it to use MAP_SYNC and the special CPU
instructions for the flushing.

- Heikki

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-01-23 16:47:34 Re: ArchiveEntry optional arguments refactoring
Previous Message Andres Freund 2019-01-23 16:45:03 Re: Typo: llvm*.cpp files identified as llvm*.c