Re: Standby recovers records from wrong timeline

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Standby recovers records from wrong timeline
Date: 2022-10-20 11:44:40
Message-ID: CANwKhkPozUvyfuy1sz0fKN4=CC3TPQOF0Tr+uEVO_XX6yqDHpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 20 Oct 2022 at 11:30, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> primary_restored did a time-travel to past a bit because of the
> recovery_target=immediate. In other words, the primary_restored and
> the replica diverge. I don't think it is legit to connect a diverged
> standby to a primary.

primary_restored did timetravel to the past, as we're doing PITR on the
primary that's the expected behavior. However replica is not diverged,
it's a copy of the exact same basebackup. The usecase is restoring a
cluster from backup using PITR and using the same backup to create a
standby. Currently this breaks when primary has not yet archived any
segments.

> So, about the behavior in doubt, it is the correct behavior to
> seemingly ignore the history file in the archive. Recovery assumes
> that the first half of the first segment of the new timeline is the
> same with the same segment of the old timeline (.partial) so it is
> legit to read the <tli=1,seg=2> file til the end and that causes the
> replica goes beyond the divergence point.

What is happening is that primary_restored has a timeline switch at
tli 2, lsn 0/2000100, and the next insert record starts in the same
segment. Replica is starting on the same backup on timeline 1, tries to
find tli 2 seg 2, which is not archived yet, so falls back to tli 1 seg 2
and replays tli 1 seg 2 continuing to tli seg 3, then connects to primary
and starts applying wal starting from tli 2 seg 4. To me that seems
completely broken.

> As you know, when new primary starts a diverged history, the
> recommended way is to blow (or stash) away the archive, then take a
> new backup from the running primary.

My understanding is that backup archives are supposed to remain valid
even after PITR or equivalently a lagging standby promoting.

--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2022-10-20 12:45:50 date_part/extract parse curiosity
Previous Message Marcos Pegoraro 2022-10-20 11:35:21 ​session_user and current_user on LOG