[HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

From: Yoshimi Ichiyanagi <ichiyanagi(dot)yoshimi(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Cc: menjo(dot)takashi(at)lab(dot)ntt(dot)co(dot)jp, ishizaki(dot)teruaki(at)lab(dot)ntt(dot)co(dot)jp
Subject: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Date: 2018-01-16 07:00:48
Message-ID: C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

These patches enable to use Persistent Memory Development Kit(PMDK)[1]
for reading/writing WAL logs on persistent memory(PMEM).
PMEM is next generation storage and it has a number of nice features:
fast, byte-addressable and non-volatile.

Using pgbench which is a PostgreSQL general benchmark, the postgres server
to which the patches is applied is about 5% faster than original server.
And using my insert benchmark, it is up to 90% faster than original one.
I will describe these details later.

This e-mail describes the following:
A) About PMDK
B) About the patches
C) The way of running benchmarks using the patches, and the results

A) About PMDK
PMDK provides the functions to allow an application to directly access
PMEM without going through the kernel as a memory for the purpose of
high-speed access to PMEM by the application.
The following APIs are available through PMDK.
A-1. APIs to open a file on PMEM, to create a file on PMEM,
and to map a file on PMEM to virtual addresses
A-2. APIs to read/write data from and to a file on PMEM

A-1. APIs to open a file on PMEM, to create a file on PMEM,
and to map a file on PMEM to virtual addresses

PMDK provides these APIs using DAX filesystem(DAX FS)[2] feature.

DAX FS is a PMEM-aware file system which allows direct access
to PMEM without using the kernel page caches. A file in DAX FS can
be mapped to memory using standard calls like mmap() on Linux.
Furthermore by mapping the file on PMEM to virtual addresses(and
after any initial minor page faults that may be required to create
the mappings in the MMU), the applications can access PMEM
using CPU load/store instructions instead of read/write system calls.

A-2. APIs to read/write data from and to a file on PMEM

PMDK provides the APIs like memcpy() to copy data to PMEM
using single instruction, multiple data(SIMD) instructions[3] and
NT store instructions[4]. These instructions improve the performance
to copy data to PMEM. As a result, using these APIs is faster than
using read/write system calls.

[1] http://pmem.io/pmdk/
[2] https://www.usenix.org/system/files/login/articles/login_summer17_07_rudoff.pdf
[3] SIMD: SIMD is the instruction operates on all loaded data in a single
operation. If the SIMD system loads eight data into registers at once,
the store operation to PMEM will happen to all eight values
at the same time.
[4] NT store instructions: NT store instructions bypass the CPU cache,
so using these instructions does not require a flush.

B) About the patches
Changes by the patches:
0001-Add-configure-option-for-PMDK.patch:
- Added "--with-libpmem" configure option to execute I/O with PMDK library

0002-Read-write-WAL-files-using-PMDK.patch:
- Added PMDK implementation for WAL I/O operations
- Added "pmem-drain" to the wal_sync_method parameter list
to write logs synchronously on PMEM

0003-Walreceiver-WAL-IO-using-PMDK.patch:
- Added PMDK implementation for Walreceiver of secondary server processes

C) The way of running benchmarks using the patches, and the results
It's the following:

Experimental setup
Server: HP ProLiant DL360 Gen9
CPU: Xeon E5-2667 v4 (3.20GHz); 2 processors(without HT)
DRAM: DDR4-2400; 32 GiB/processor
(8GiB/socket x 4 sockets/processor) x 2 processors
NVDIMM: DDR4-2133; 32 GiB/processor
(8GiB/socket x 4 sockets/processor) x 2 processors
HDD: Seagate Constellation2 2.5inch SATA 3.0. 6Gb/s 1TB 7200rpm x 1
OS: Ubuntu 16.04, linux-4.12
DAX FS: ext4
NVML: master(at)Aug 30, 2017
PostgreSQL: master
Note: I bound the postgres processes to one NUMA node,
and the benchmarks to other NUMA node.

C-1. Configuring PMEM for using as a block device
# ndctl list
# ndctl create-namespace -f -e namespace0.0 --mode=memory -M dev

C-2. Creating a file system on PMEM, and mounting it with DAX
# mkfs.ext4 /dev/pmem0
# mount -t ext4 -o dax /dev/pmem0 /mnt/pmem0

C-3. Setting PMEM_IS_PMEM_FORCE to determine if the WAL files is stored
on PMEM
Note: If this environment variable is not set, postgres processes are
not started.
# export PMEM_IS_PMEM_FORCE=1

C-4. Installing PostgreSQL
Note: There are 3 important things in installing PostgreSQL.
a. Executing "./configure --with-libpmem" to link libpmem
b. Setting WAL directory on PMEM
c. Modifying wal_sync_method parameter in postgresql.conf from fdatasync
to pmem_drain

# cd /path/to/[PG_source dir]
# ./configure --with-libpmem
# make && make install
# initdb /path/to/PG_DATA -X /mnt/pmem0/path/to/[PG_WAL dir]
# cat /path/to/PG_DATA/postgresql.conf | sed -e s/#wal_sync_method\ =\
fsync/wal_sync_method\ =\ pmem_drain/ > /path/to/PG_DATA/postgresql.conf.
tmp
# mv /path/to/PG_DATA/postgresql.conf.tmp /path/to/PG_DATA/postgresql.conf
# pg_ctl start -D /path/to/PG_DATA
# created [DB_NAME]

C-5. Running the 2 benchmarks(1. pgbench, 2. my insert benchmark)
C-5-1. pgbench
# numactl -N 1 pgbech -c 32 -j 8 -T 120 -M prepared [DB_NAME]

The averages of running pgbench three times are:
wal_sync_method=fdatasync: tps = 43,179
wal_sync_method=pmem_drain: tps = 45,254

C-5-2. pclinet_thread: my insert benchmark
Preparation
CREATE TABLE [TABLE_NAME] (id int8, value text);
ALTER TABLE [TABLE_NAME] ALTER value SET STORAGE external;
PREPARE insert_sql (int8) AS INSERT INTO %s (id, value) values ($1, '
[1K_data]');

Execution
BEGIN; EXECUTE insert_sql(%lld); COMMIT;
Note: I ran this quer 5M times with 32 threads.

# ./pclient_thread
Invalid Arguments:
Usage: ./pclient_thread [The number of threads] [The number to insert
tuples] [data size(KB)]
# numactl -N 1 ./pclient_thread 32 5242880 1

The averages of running this benchmark three times are:
wal_sync_method=fdatasync: tps = 67,780
wal_sync_method=pmem_drain: tps = 131,962

--
Yoshimi Ichiyanagi

Attachment Content-Type Size
0001-Add-configure-option-for-PMDK.patch application/octet-stream 5.1 KB
0002-Read-write-WAL-files-using-PMDK.patch application/octet-stream 46.9 KB
0003-Walreceiver-WAL-IO-using-PMDK.patch application/octet-stream 4.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-16 07:12:11 Re: [HACKERS] Deadlock in XLogInsert at AIX
Previous Message Haribabu Kommi 2018-01-16 06:56:22 Re: Enhance pg_stat_wal_receiver view to display connected host