From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Write lifetime hints for NVMe |
Date: | 2018-01-27 15:03:55 |
Message-ID: | 30965a3e-5bde-4f70-dc06-1ff297abca4c@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 01/27/2018 02:20 PM, Dmitry Dolgov wrote:
> Hi,
>
> From what I see some time ago the write lifetime hints support for NVMe multi
> streaming was merged into Linux kernel [1]. Theoretically it allows data
> written together on media so they can be erased together, which minimizes
> garbage collection, resulting in reduced write amplification as well as
> efficient flash utilization [2]. I couldn't find any discussion about that on
> hackers, so I decided to experiment with this feature a bit. My idea was to
> test quite naive approach when all file descriptors, that are related to
> temporary files, have assigned `RWH_WRITE_LIFE_SHORT`, and rest of them
> `RWH_WRITE_LIFE_EXTREME`. Attached patch is a dead simple POC without any
> infrastructure around to enable/disable hints.
>
> It turns out that it's possible to perform benchmarks on some EC2 instance
> types (e.g. c5) with the corresponding version of the kernel, since they expose
> a volume as nvme device:
>
> ```
> # nvme list
> Node SN Model
> Namespace Usage Format FW Rev
> ---------------- --------------------
> ---------------------------------------- ---------
> -------------------------- ---------------- --------
> /dev/nvme0n1 vol01cdbc7ec86f17346 Amazon Elastic Block Store
> 1 0.00 B / 8.59 GB 512 B + 0 B 1.0
> ```
>
> To get some baseline results I've run several rounds of pgbench on these quite
> modest instances (dedicated, with optimized EBS) with slightly adjusted
> `max_wal_size` and with default configuration:
>
> $ pgbench -s 200 -i
> $ pgbench -T 600 -c 2 -j 2
>
> Analyzing `strace` output I can see that during this test there were some
> significant number of operations with pg_stat_tmp and xlogtemp, so I assume
> write lifetime hints should have some effect.
>
> As a result I've got reduction of latency about 5-8% (but so far these numbers
> are unstable, probably because of virtualization).
>
> ```
> # without patch
> number of transactions actually processed: 491945
> latency average = 2.439 ms
> tps = 819.906323 (including connections establishing)
> tps = 819.908755 (excluding connections establishing)
> ```
>
> ```
> with patch
> number of transactions actually processed: 521805
> latency average = 2.300 ms
> tps = 869.665330 (including connections establishing)
> tps = 869.668026 (excluding connections establishing)
> ```
>
Aren't those numbers far lower that you'd expect from NVMe storage? I do
have a NVMe drive (Intel 750) in my machine, and I can do thousands of
transactions on it with two clients. Seems a bit suspicious.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-01-27 16:14:53 | Re: Add RANGE with values and exclusions clauses to the Window Functions |
Previous Message | Pavel Stehule | 2018-01-27 14:31:43 | Re: [HACKERS] proposal: psql command \graw |