Re: Corruption during WAL replay

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: deniel1495(at)mail(dot)ru, ibrar(dot)ahmad(at)gmail(dot)com, tejeswarm(at)hotmail(dot)com, andres(at)anarazel(dot)de, hlinnaka(at)iki(dot)fi, masahiko(dot)sawada(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, hexexpert(at)comcast(dot)net
Subject: Re: Corruption during WAL replay
Date: 2022-03-16 05:14:32
Message-ID: 20220316.141432.2298656526174566963.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 15 Mar 2022 12:44:49 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> On Wed, Jan 26, 2022 at 3:25 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > The attached is the fixed version and it surely works with the repro.
>
> Hi,
>
> I spent the morning working on this patch and came up with the
> attached version. I wrote substantial comments in RelationTruncate(),
> where I tried to make it more clear exactly what the bug is here, and
> also in storage/proc.h, where I tried to clarify both the use of the
> DELAY_CHKPT_* flags in general terms. If nobody is too sad about this
> version, I plan to commit it.

Thanks for taking this and for the time. The additional comments
seems describing the flags more clearly.

storage.c:
+ * Make sure that a concurrent checkpoint can't complete while truncation
+ * is in progress.
+ *
+ * The truncation operation might drop buffers that the checkpoint
+ * otherwise would have flushed. If it does, then it's essential that
+ * the files actually get truncated on disk before the checkpoint record
+ * is written. Otherwise, if reply begins from that checkpoint, the
+ * to-be-truncated buffers might still exist on disk but have older
+ * contents than expected, which can cause replay to fail. It's OK for
+ * the buffers to not exist on disk at all, but not for them to have the
+ * wrong contents.

FWIW, this seems like slightly confusing between buffer and its
content. I can read it correctly so I don't mind if it is natural
enough.

Otherwise all the added/revised comments looks fine. Thanks for the
labor.

> I think it should be back-patched, too, but that looks like a bit of a
> pain. I think every back-branch will require different adjustments.

I'll try that, if you are already working on it, please inform me. (It
may more than likely be too late..)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-03-16 05:40:23 Re: USE_BARRIER_SMGRRELEASE on Linux?
Previous Message Amit Kapila 2022-03-16 05:08:36 Re: Tablesync early exit