Re: VM corruption on standby

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: VM corruption on standby
Date: 2025-08-21 02:07:16
Message-ID: E9637363-7B73-43CD-AFBF-3DD651E5BD13@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 20 Aug 2025, at 00:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Andrey Borodin <x4mmm(at)yandex-team(dot)ru> writes:
>> I believe there is a bug with PageIsAllVisible(page) && visibilitymap_clear(). But I cannot prove it with an injection point test. Because injections points rely on CondVar, that per se creates corruption in critical section. So I'm reading this discussion and wonder if CondVar will be fixed in some clever way or I'd better invent new injection point wait mechanism.
>
> Yeah, I was coming to similar conclusions in the reply I just sent:
> we don't really want a policy that we can't put injection-point-based
> delays inside critical sections. So that infrastructure is leaving
> something to be desired.
>
> Having said that, the test script is also doing something we tell
> people not to do, namely SIGKILL the postmaster. Could we use
> SIGQUIT (immediate shutdown) instead?

I'm working backwards from corruptions I see on our production.
And almost always I see stormbringers like OOM, power outage or Debian scripts that (I think) do kill -9 when `service postgresql stop` takes too long.

Best regards, Andrey Borodin.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-08-21 02:13:44 Re: Don't treat virtual generated columns as missing statistics in vacuumdb --missing-stats-only
Previous Message Justin Pryzby 2025-08-21 01:51:16 Re: analyze-in-stages post upgrade questions