Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-04-19 11:05:18
Message-ID: CALDaNm1GOAtTQqxracVKx+C5QjBiJFuZ2+W+pbGy_bre-sf+Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 14 Apr 2023 at 16:00, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Julien,
>
> > Sorry for the delay, I didn't had time to come back to it until this afternoon.
>
> No issues, everyone is busy:-).
>
> > I don't think that your analysis is correct. Slots are guaranteed to be
> > stopped after all the normal backends have been stopped, exactly to avoid such
> > extraneous records.
> >
> > What is happening here is that the slot's confirmed_flush_lsn is properly
> > updated in memory and ends up being the same as the current LSN before the
> > shutdown. But as it's a logical slot and those records aren't decoded, the
> > slot isn't marked as dirty and therefore isn't saved to disk. You don't see
> > that behavior when doing a manual checkpoint before (per your script comment),
> > as in that case the checkpoint also tries to save the slot to disk but then
> > finds a slot that was marked as dirty and therefore saves it.
> >
> > In your script's scenario, when you restart the server the previous slot data
> > is restored and the confirmed_flush_lsn goes backward, which explains those
> > extraneous records.
>
> So you meant to say that the key point was that some records which are not sent
> to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
> not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
> by walsender when the process gets reply from worker process, so your analysis
> seems correct.
>
> > It's probably totally harmless to throw away that value for now (and probably
> > also doesn't lead to crazy amount of work after restart, I really don't know
> > much about the logical slot code), but clearly becomes problematic with your
> > usecase. One easy way to fix this is to teach the checkpoint code to force
> > saving the logical slots to disk even if they're not marked as dirty during a
> > shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
> > interfere with the cfbot). With this patch applied I reliably only see a final
> > shutdown checkpoint record with your scenario.
> >
> > Now such a change will make shutdown a bit more expensive when using logical
> > replication, even if in 99% of cases you will not need to save the
> > confirmed_flush_lsn value, so I don't know if that's acceptable or not.
>
> In any case we these records must be advanced. IIUC, currently such records are
> read after rebooting but ingored, and this patch just skips them. I have not measured,
> but there is a possibility that is not additional overhead, just a trade-off.
>
> Currently I did not come up with another solution, so I have included your patch.
> Please see 0002.
>
> Additionally, I added a checking functions in 0003
> According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
> record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
> Therefore, the function ensures that the difference between current insert position
> and confirmed_flush_lsn is less than (above + page header).

Logical replication slots can be created only if wal_level >= logical,
currently we do not have any check to see if wal_level >= logical if
"--include-logical-replication-slots" option is specified. If
include-logical-replication-slots is specified with pg_upgrade, we
will be creating replication slots after a lot of steps like
performing prechecks, analyzing, freezing, deleting, restoring,
copying, setting related objects and then create replication slot,
where we will be erroring out after a lot of time(Many cases
pg_upgrade takes a lot of hours to perform these operations). I feel
it would be better to add a check in the beginning itself somewhere in
check_new_cluster to see if wal_level is set appropriately in case of
include-logical-replication-slot option to detect and throw an error
early itself.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2023-04-19 11:50:30 Re: Tab completion for GRANT MAINTAIN
Previous Message vignesh C 2023-04-19 09:41:08 Re: [PoC] pg_upgrade: allow to upgrade publisher node