Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-11-28 12:32:19
Message-ID: CAD21AoDkyyC=wa2=1Ruo_L8g16xf_W5Xyhp-=3j9urT916b9gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 28, 2023 at 6:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > One month has already been passed since this main patch got committed
> > > but reading this change, I have some questions on new
> > > binary_upgrade_logical_slot_has_caught_up() function:
> > >
> > > Is there any reason why this function can be executed only in binary
> > > upgrade mode? It seems to me that other functions in
> > > pg_upgrade_support.c must be called only in binary upgrade mode
> > > because it does some hacky changes internally. On the other hand,
> > > binary_upgrade_logical_slot_has_caught_up() just calls
> > > LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> > > internally. If we make this function usable in normal mode, the user
> > > would be able to check each slot's upgradability without pg_upgrade
> > > --check command (or without stopping the server if the user can ensure
> > > no more meaningful WAL records are generated).
> >
> > It may happen that such a user-facing function tells there's no
> > unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
> > Therefore, the information the function gives turns out to be
> > incorrect. I don't see a real-world use-case for such a function right
> > now. If there's one, it's not a big change to turn it into a
> > user-facing function.
> >
>
> Yeah, as of now, I don't see a use case for it and in fact, it could
> lead to unpredictable results. Immediately after calling the function,
> there could be more activity on the server which could make the
> results incorrect. I think to check the slot's upgradeability, one can
> rely on the results of the pg_upgrade --check functionality.

Fair point.

This function is already a user-executable function as it's in
pg_catalog but is restricted to be executed only in binary upgrade
even though it doesn't change anything internally. So it wasn't clear
to me why we put such a restriction.

>
> > > ---
> > > Also, the function checks if the user has the REPLICATION privilege
> > > but I think that only superuser can connect to the server in binary
> > > upgrade mode in the first place.
> >
> > If that were true, I don't see a problem in having
> > CheckSlotPermissions() there, in fact it can act as an assertion.
> >
>
> I think we can change it to assertion or may elog(ERROR, ...) with a
> comment as to why we don't expect this can happen.

+1 for an assertion, to match other checks in the function.

>
> > > ---
> > > The following error message doesn't match the function name:
> > >
> > > /* We must check before dereferencing the argument */
> > > if (PG_ARGISNULL(0))
> > > elog(ERROR, "null argument to
> > > binary_upgrade_validate_wal_records is not allowed");
> > >
>
> This should be fixed.
>
> > > ---
> > > { oid => '8046', descr => 'for use by pg_upgrade',
> > > proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
> > > provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> > > proargtypes => 'name',
> > > prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> > >
> > > The function is not a strict function but we check in the function if
> > > the passed argument is not null. I think it would be clearer to make
> > > it a strict function.
> >
> > I think it has been done that way similar to
> > binary_upgrade_create_empty_extension().

binary_upgrade_create_empty_extension() needs to be a non-strict
function since it needs to accept NULL in some arguments such as
extConfig. On the other hand,
binary_upgrade_logical_slot_has_caught_up() doesn't handle NULL and
it's conventional to make such a function a strict function.

> >
> > > ---
> > > LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> > > guess it's more suitable to be in slotfunc.s where similar functions
> > > such as pg_logical_replication_slot_advance() is also defined.
> >
> > Why not in logicalfuncs.c?
> >
>
> I am not sure if either of those is better than logical.c. IIRC, I
> thought it was okay to keep in logical.c as others primarily deal with
> exposed SQL functions and I felt it somewhat matches with the intent
> of logical.c ("The goal is to encapsulate most of the internal
> complexity for consumers of logical decoding, so they can create and
> consume a changestream with a low amount of code..").

I see your point. To me it looks that the functions in logical.c are
APIs and internal functions to manage logical decoding context and
replication slot (e.g., restart_lsn). On the other hand,
LogicalReplicationSlotHasPendingWal() seems to be a user of the
logical decoding. But anyway, it seems that three hackers have
different opinions. So we can keep it unless someone has a good reason
to change it.

On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
>
> Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
> un-strict to keep a caller function simpler.
>
> Currently get_old_cluster_logical_slot_infos() executes a query and it contains
> binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we assumed
> either true or false is returned.
>
> But if proisstrict is changed true, we must handle the case when NULL is returned.
> It is small but backseat operation.

Which cases are you concerned pg_upgrade could pass NULL to
binary_upgrade_logical_slot_has_caught_up()?

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-11-28 12:44:25 Re: Streaming I/O, vectored I/O (WIP)
Previous Message Tomas Vondra 2023-11-28 12:29:27 Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan