Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-21 12:15:00
Message-ID: CAA4eK1L9oJmdxprFR3oob5KLpHUnkJAt5Le4woxO3wHz-SZ+TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 21, 2023 at 4:57 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Good catch, I could not notice because it worked well in my RHEL. Here is the
> > updated version.
>
> Thanks for the patch. I have some comments on v42:
>
> 1.
> +{ oid => '8046', descr => 'for use by pg_upgrade',
> + proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
> + provolatile => 'v', proparallel => 'u', prorettype => 'bool',
>
> + if (PG_ARGISNULL(0))
> + elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
>
> Can proisstrict => 'f' be removed so that there's no need for explicit
> PG_ARGISNULL check? Any specific reason to keep it?
>

Probably trying to keep it similar with
binary_upgrade_create_empty_extension(). I think it depends what
behaviour we expect for NULL input.

> And, the before the ISNULL check the arg is read, which isn't good.
>

Right.

> 2.
> +Datum
> +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
>
> The function name looks too generic in the sense that it validates WAL
> records for correctness/corruption, but it is not. Can it be something
> like binary_upgrade_{check_for_wal_logical_end,
> check_for_logical_end_of_wal} or such?
>

How about slightly modified version like
binary_upgrade_validate_wal_logical_end?

> 3.
> + /* Quick exit if the given lsn is larger than current one */
> + if (start_lsn >= GetFlushRecPtr(NULL))
> + PG_RETURN_BOOL(false);
> +
>
> An LSN that doesn't exists yet is an error IMO, may be an error better here?
>

It will anyway lead to error at later point but we will provide more
information about all the slots that have invalid value of
confirmed_flush LSN.

> 4.
> + * This function is used to verify that there are no WAL records (except some
> + * types) after confirmed_flush_lsn of logical slots, which means all the
> + * changes were replicated to the subscriber. There is a possibility that some
> + * WALs are inserted during upgrade, so such types would be ignored.
> + *
>
> This comment before the function better be at the callsite of the
> function, because as far as this function is concerned, it checks if
> there are any WAL records that are not "certain" types after the given
> LSN, it doesn't know logical slots or confirmed_flush_lsn or such.
>

Yeah, we should give information at the callsite but I guess we need
to give some context atop this function as well so that it is easier
to explain the functionality.

> 5. Trying to understand the interaction of this feature with custom
> WAL records that a custom WAL resource manager puts in. Is it okay to
> have custom WAL records after the "logical WAL end"?
> + /*
> + * There is a possibility that following records may be generated
> + * during the upgrade.
> + */
>

I don't think so. The only valid records for the checks in this
function are probably the ones that can get generated by the upgrade
process because we ensure that walsender sends all the records before
it exits at shutdown time.

>
> 10.
> Why just wal_level and max_replication_slots, why not
> max_worker_processes and max_wal_senders too?

Isn't it sufficient to check the parameters that are required to
create a slot aka what we check in the function
CheckLogicalDecodingRequirements()? We are only creating logical slots
here so I think that should be sufficient.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2023-09-21 12:41:32 Re: remaining sql/json patches
Previous Message Pavel Borisov 2023-09-21 12:10:58 Re: Index range search optimization