Re: [PoC] Non-volatile WAL buffer

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Takashi Menjo <takashi(dot)menjo(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Takashi Menjo <takashi(dot)menjou(dot)vg(at)hco(dot)ntt(dot)co(dot)jp>, "Deng, Gang" <gang(dot)deng(at)intel(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Non-volatile WAL buffer
Date: 2020-11-25 01:44:55
Message-ID: de05bcb4-0441-84c1-8eaf-45beefad1d67@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/25/20 1:27 AM, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
> From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
>> It's interesting that they only place the tail of the log on PMEM,
>> i.e. the PMEM buffer has limited size, and the rest of the log is
>> not on PMEM. It's a bit as if we inserted a PMEM buffer between our
>> wal buffers and the WAL segments, and kept the WAL segments on
>> regular storage. That could work, but I'd bet they did that because
>> at that time the NV devices were much smaller, and placing the
>> whole log on PMEM was not quite possible. So it might be
>> unnecessarily complicated, considering the PMEM device capacity is
>> much higher now.
>>
>> So I'd suggest we simply try this:
>>
>> clients -> buffers (DRAM) -> wal segments (PMEM)
>>
>> I plan to do some hacking and maybe hack together some simple tools
>> to benchmarks various approaches.
>
> I'm in favor of your approach. Yes, Intel PMEM were available in
> 128/256/512 GB when I checked last year. That's more than enough to
> place all WAL segments, so a small PMEM wal buffer is not necessary.
> I'm excited to see Postgres gain more power.
>

Cool. FWIW I'm not 100% sure it's the right approach, but I think it's
worth testing. In the worst case we'll discover that this architecture
does not allow fully leveraging PMEM benefits, or maybe it won't work
for some other reason and the approach proposed here will work better.
Let's play a bit and we'll see.

I have hacked a very simple patch doing this (essentially replacing
open/write/close calls in xlog.c with pmem calls). It's a bit rough but
seems good enough for testing/experimenting. I'll polish it a bit, do
some benchmarks, and share some numbers in a day or two.

regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-11-25 01:46:12 Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path
Previous Message Tom Lane 2020-11-25 01:40:06 Re: About adding a new filed to a struct in primnodes.h