Re: VM corruption on standby

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: VM corruption on standby
Date: 2025-08-19 18:34:21
Message-ID: C94A8AF6-92AE-4F1A-B029-81E34DA831F2@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 19 Aug 2025, at 23:23, Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>
>> We'd probably be best off to get back to the actual bug the
>> thread started with, namely whether we aren't doing the wrong
>> thing with VM-update order of operations.
>>
>> regards, tom lane
>
> My understanding is that there is no bug in the VM. At least not in
> [0] test, because it uses an injection point in the CRIT section,
> making the server exit too early.
> So, behaviour with inj point and without are very different.
> The corruption we are looking for has to reproducer (see [1]).

I believe there is a bug with PageIsAllVisible(page) && visibilitymap_clear(). But I cannot prove it with an injection point test. Because injections points rely on CondVar, that per se creates corruption in critical section. So I'm reading this discussion and wonder if CondVar will be fixed in some clever way or I'd better invent new injection point wait mechanism.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-08-19 18:37:19 Re: Improve LWLock tranche name visibility across backends
Previous Message Nathan Bossart 2025-08-19 18:31:35 Re: Improve LWLock tranche name visibility across backends