| From: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com> |
|---|---|
| To: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
| Cc: | Daniel Gustafsson <daniel(at)yesql(dot)se>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net> |
| Subject: | Re: Changing the state of data checksums in a running cluster |
| Date: | 2026-05-05 20:05:51 |
| Message-ID: | CAHg+QDfEpF0S7S6rzgjKrvq6HrE_CdKg2KM9mS7dKZQhqKuVtA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Tue, May 5, 2026 at 12:45 PM SATYANARAYANA NARLAPURAM <
satyanarlapuram(at)gmail(dot)com> wrote:
> Hi,
>
> On Tue, May 5, 2026 at 12:19 PM Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
> wrote:
>
>> Hi,
>>
>> On Wed, 6 May 2026 at 00:38, Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>>
>>> > On 5 May 2026, at 17:21, Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
>>> wrote:
>>>
>>> > I've a small concern in 0001. The new guard uses only
>>> RelationNeedsWAL(reln),
>>> > but ProcessSingleRelationByOid() iterates all forks. For unlogged
>>> relations,
>>> > the init fork is special, there are several existing call sites that
>>> preserve
>>> > WAL for INIT_FORKNUM, for example using
>>> >
>>> > RelationNeedsWAL(rel) || forknum == INIT_FORKNUM
>>> >
>>> > and catalog/storage.c notes that unlogged init forks need WAL and sync.
>>> >
>>> > So I think the condition in ProcessSingleRelationFork() should
>>> preserve the
>>> > init-fork case, e.g.
>>> >
>>> > if (RelationNeedsWAL(reln) || forkNum == INIT_FORKNUM)
>>> > log_newpage_buffer(buf, false);
>>>
>>> Which failure scenario are you thinking about here? When dealing with
>>> the
>>> catalog relation I can see the need but here we are reading, and
>>> writing, data
>>> pages. In which case would we need to issue an FPI for an unlogged
>>> relation
>>> init fork? I might be missing something obvious here.
>>>
>>
>> The case I was thinking about is not the unlogged relation contents
>> themselves, but the init fork used as the reset template. Some unlogged
>> indexes can have initialized pages in the init fork, and recovery later
>> copies
>> that fork to the main fork when resetting unlogged relations.
>>
>> So my concern was that, during online checksum enable, we might update the
>> checksum state of an init-fork page on the primary but not WAL-log an FPI
>> for
>> it because RelationNeedsWAL(reln) is false. Then a standby, or recovery
>> after
>> a crash, could still have the old version of that init fork. If that
>> fork is
>> later copied to the main fork after checksums are enabled, it might lead
>> to
>> checksum verification failures?
>>
>> Maybe there is another guarantee that makes this impossible, but I did
>> not see
>> it from the patch/test. That is why I wondered whether the condition
>> should
>> preserve the existing special treatment for INIT_FORKNUM.
>>
>
> It is a bug in the code, I added a test in the v2 patch to test this
> scenario and the test
> failed earlier.
>
Please find the latest v3. Earlier I took dependency on amcheck and removed
it in v3
and instead added queries that forces necessary checks.
Thanks,
Satya
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Skip-WAL-for-unlogged-relations-during-online-checks.patch | application/octet-stream | 8.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Davis | 2026-05-05 20:23:12 | Re: Refactor: allow pg_strncoll(), etc., to accept -1 length for NUL-terminated cstrings. |
| Previous Message | Alexander Lakhin | 2026-05-05 20:00:00 | Re: [BUG] CRASH: ECPGprepared_statement() and ECPGdeallocate_all() when connection is NULL |