Re: [PoC] Non-volatile WAL buffer

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Takashi Menjo <takashi(dot)menjo(at)gmail(dot)com>
Cc: Takashi Menjo <takashi(dot)menjou(dot)vg(at)hco(dot)ntt(dot)co(dot)jp>, "Deng, Gang" <gang(dot)deng(at)intel(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Non-volatile WAL buffer
Date: 2020-11-28 01:37:17
Message-ID: 9f1ef44a-1afa-d092-7a72-4b99f00e1197@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/27/20 1:02 AM, Tomas Vondra wrote:
>
> Unfortunately, that patch seems to fail for me :-(
>
> The patches seem to be for PG12, so I applied them on REL_12_STABLE (all
> the parts 0001-0005) and then I did this:
>
> LIBS="-lpmem" ./configure --prefix=/home/tomas/pg-12-pmem --enable-debug
> make -s install
>
> initdb -X /opt/pmemdax/benchmarks/wal -D /opt/nvme/benchmarks/data
>
> pg_ctl -D /opt/nvme/benchmarks/data/ -l pg.log start
>
> createdb test
> pgbench -i -s 500 test
>
>
> which however fails after just about 70k rows generated (PQputline
> failed), and the pg.log says this:
>
> PANIC: could not open or mmap file
> "pg_wal/000000010000000000000006": No such file or directory
> CONTEXT: COPY pgbench_accounts, line 721000
> STATEMENT: copy pgbench_accounts from stdin
>
> Takashi-san, can you check and provide a fixed version? Ideally, I'll
> take a look too, but I'm not familiar with this patch so it may take
> more time.
>

I did try to get this working today, unsuccessfully. I did manage to
apply the 0002 part separately on REL_12_0 (there's one trivial rejected
chunk), but I still get the same failure. In fact, when built with
assertions, I can't even get initdb to pass :-(

I do get this:

TRAP: FailedAssertion("!(page->xlp_pageaddr == ptr - (ptr % 8192))",
File: "xlog.c", Line: 1813)

The values involved here are

xlp_pageaddr = 16777216
ptr = 20971520

so the page seems to be at the very beginning of the second WAL segment,
but the pointer is somewhere later. A full backtrace is attached.

I'll continue investigating this, but the xlog code is not particularly
easy to understand in general, so it may take time.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
initdb-crash.txt text/plain 5.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2020-11-28 01:50:54 Re: A few new options for CHECKPOINT
Previous Message Andreas Karlsson 2020-11-28 00:04:09 What to do about the broken btree_gist for inet/cidr?