Re: BUG: Former primary node might stuck when started as a standby

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: Former primary node might stuck when started as a standby
Date: 2024-01-23 11:00:01
Message-ID: 1eb70b1b-7575-9827-2534-439bdc1900d3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Aleksander,

[ I'm writing this off-list to minimize noise, but we can continue the discussion in -hackers, if you wish ]

22.01.2024 14:00, Aleksander Alekseev wrote:

> Hi,
>
>> But node1 knows that it's a standby now and it's expected to get all the
>> WAL records from the primary, doesn't it?
> Yes, but node1 doesn't know if it always was a standby or not. What if
> node1 was always a standby, node2 was a primary, then node2 died and
> node3 is a new primary.

Excuse me, but I still can't understand what could go wrong in this case.
Let's suppose, node1 has WAL with the following contents before start:
CPLOC | TL1R1 | TL1R2 | TL1R3 |

while node2's WAL contains:
TL1R1 | TL2R1 | TL2R2 | ...

where CPLOC -- a checkpoint location, TLxRy -- a record y on a timeline x.

I assume that requesting all WAL records from node2 without redoing local
records should be the right thing.

And even in the situation you propose:
CPLOC | TL2R5 | TL2R6 | TL2R7 |

while node3's WAL contains:
TL2R5 | TL3R1 | TL3R2 | ...

I see no issue with applying records from node3...

> If node1 sees inconsistency in the WAL
> records, it should report it and stop doing anything, since it doesn't
> has all the information needed to resolve the inconsistencies in all
> the possible cases. Only DBA has this information.

I still wonder, what can be considered an inconsistency in this situation.
Doesn't the exactly redo of all the local WAL records create the
inconsistency here?
For me, it's the question of an authoritative source, and if we had such a
source, we should trust it's records only.

Or in the other words, what if the record TL1R3, which node1 wrote to it's
WAL, but didn't send to node2, happened to have an incorrect checksum (due
to partial write, for example)?
If I understand correctly, node1 will just stop redoing WAL at that
position to receive all the following records from node2 and move forward
without reporting the inconsistency (an extra WAL record).

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-01-23 11:43:12 Re: Synchronizing slots from primary to standby
Previous Message Christoph Berg 2024-01-23 10:38:22 Re: psql: Allow editing query results with \gedit