Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-04-07 15:29:44
Message-ID: 20230407152944.j3rek4zyrzggcij7@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 07, 2023 at 09:40:14AM +0000, Hayato Kuroda (Fujitsu) wrote:
>
> > As I mentioned in my original thread, I'm not very familiar with that code, but
> > I'm a bit worried about "all the changes generated on publisher must be send
> > and applied". Is that a hard requirement for the feature to work reliably?
>
> I think the requirement is needed because the existing WALs on old node cannot be
> transported on new instance. The WAL hole from confirmed_flush to current position
> could not be filled by newer instance.

I see, that was also the first blocker I could think of when Amit mentioned
that feature weeks ago and I also don't see how that whole could be filled
either.

> > If
> > yes, how does this work if some subscriber node isn't connected when the
> > publisher node is stopped? I guess you could add a check in pg_upgrade to make
> > sure that all logical slot are indeed caught up and fail if that's not the case
> > rather than assuming that a clean shutdown implies it. It would be good to
> > cover that in the TAP test, and also cover some corner cases, like any new row
> > added on the publisher node after the pg_upgrade but before the subscriber is
> > reconnected is also replicated as expected.
>
> Hmm, good point. Current patch could not be handled the case because walsenders
> for the such slots do not exist. I have tested your approach, however, I found that
> CHECKPOINT_SHUTDOWN record were generated twice when publisher was
> shutted down and started. It led that the confirmed_lsn of slots always was behind
> from WAL insert location and failed to upgrade every time.
> Now I do not have good idea to solve it... Do anyone have for this?

I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that? That's hackish, but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was. For the double
shutdown checkpoint, I'm not sure that I get the problem. The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2023-04-07 15:39:02 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Drouvot, Bertrand 2023-04-07 15:13:13 Re: Minimal logical decoding on standbys