From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Kouber Saparev <kouber(at)gmail(dot)com> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: BF mamba failure |
Date: | 2025-09-10 23:28:35 |
Message-ID: | aMIJozPEP5OLI3Yj@paquier.xyz |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 09, 2025 at 04:07:45PM +0300, Kouber Saparev wrote:
> Yet again one of our replicas died. Should I file a bug report or
> something, what should we do in order to prevent it? Restart the database
> every month/week or so?...
I don't think we need another bug to report the same problem. We are
aware that this may be an issue and that it is hard to track, the
problem is to find room to be able to investigate it, at this stage.
I may be able to come back to it soon-ishly, looking at how to trigger
any race condition. The difficulty is to think how the current code
is able to reach this state, because we have a race condition at hand
in standbys.
As a start, are these failures only in the startup process? Has the
startup process reached a consistent state when the problem happens
because the replay code is too eager at removing the stats entries?
Has it not reached a consistent state. These could be useful hints to
extract a reproducible test case, looking for common patterns.
I'll ask around if I have seen cases like that in the user pool I have
an access to.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-09-10 23:30:56 | Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput |
Previous Message | Michael Paquier | 2025-09-10 23:21:21 | Re: Incorrect logic in XLogNeedsFlush() |