Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-29 07:30:04
Message-ID: CALj2ACWAdYxgzOpXrP=JMiOaWtAT2VjPiKw7ryGbipkSkocJ=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 28, 2023 at 6:08 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, September 28, 2023 5:32 PM Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> > Perhaps, a function in logical/decode.c returning the WAL record as valid if the
> > record type is any of the above. A note in replication/decode.h and/or
> > access/rmgrlist.h asking rmgr adders to categorize the WAL record type in the
> > new function based on its decoding operation might help with future new WAL
> > record type additions.
> >
> > Thoughts?
>
> I think this approach can work, but I am not sure if it's better than other
> approaches. Mainly because it has almost the same maintaince burden as the
> current approach, i.e. we need to verify and update the check function each
> time we add a new WAL record type.

I think that's not a big problem if we have comments in
replication/decode.h, access/rmgrlist.h, docs to categorize the new
WAL records as decodable. Currently, the WAL record types adders will
have to do certain things based on notes in comments or docs anyways.

Another idea to enforce categorizing decodability of WAL records is to
have a new RMGR API rm_is_record_decodable or such, the RMGR
implementers will then add respective functions returning true/false
if a given WAL record is decodable or not:
void (*rm_decode) (struct LogicalDecodingContext *ctx,
struct XLogRecordBuffer *buf);
bool (*rm_is_record_decodable) (uint8 type);
} RmgrData;

PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL,
NULL, NULL, xlog_is_record_decodable), then the
xlog_is_record_decodable can look something like [1].

This approach can also enforce/help custom RMGR implementers to define
the decodability of the WAL records.

> Apart from the WAL scan approach, we also considered alternative approach that
> do not impose an additional maintenance burden and could potentially be less
> complex. For example, we can add a new field in pg_controldata to record the
> last checkpoint that happens in non-upgrade mode, so that we can compare the
> slot's confirmed_flush_lsn with this value, If they are the same, the WAL
> should have been consumed otherwise we disallow upgrading this slot. I would
> appreciate if you can share your thought about this approach.

I read this https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com
and I agree with the concern on adding a new filed in pg_controldata
just for this purpose and spreading the IsBinaryUpgrade code in
checkpointer. Another concern for me with a new filed in
pg_controldata approach is that it makes it hard to make this patch
support back branches. Therefore, -1 for this approach from me.

> And if we decided to use WAL scan approach, instead of checking each record, we
> could directly check if the WAL record can be decoded into meaningful results
> by use test_decoding to decode them. This approach also doesn't add new
> maintenance burden as we anyway need to update the test_decoding if any decode
> logic for new record changes. This was also mentioned [1].
>
> What do you think ?
>
> [1] https://www.postgresql.org/message-id/OS0PR01MB5716FC0F814D78E82E4CC3B894C3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-1 for decoding the WAL with test_decoding, I don't think it's a great
idea to create temp slots and launch walsenders during upgrade.

IMO, WAL scanning approach looks better. However, if were to optimize
it by not scanning WAL records for every replication slot
confirmed_flush_lsn (CFL), start with lowest CFL (min of all slots
CFL), and scan till the end of WAL. The
binary_upgrade_validate_wal_logical_end function can return an array
of LSNs at which decodable WAL records are found. Then, use CFLs of
all other slots and this array to determine if the slots have
unconsumed WAL. Following is an illustration of this idea:

1. Slots s1, s2, s3, s4, s5 with CFLs 100, 90, 110, 70, 80 respectively.
2. Min of all CFLs is 70 for slot s4.
3. Start scanning WAL from min CFL 70 for slot s4, say there are
unconsumed WAL at LSN {85, 89}.
4. Now, without scanning WAL for rest of the slots, determine if they
have unconsumed WAL.
5.1. CFL of slot s1 is 100 and no unconsumed WAL at or after LSN 100 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.2. CFL of slot s2 is 90 and no unconsumed WAL at or after LSN 90 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.3. CFL of slot s3 is 110 and no unconsumed WAL at or after LSN 110 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.4. CFL of slot s4 is 70 and there's unconsumed WAL at or after LSN
70 - look at the array of unconsumed WAL LSNs {85, 89}.
5.5. CFL of slot s5 is 80 and there's unconsumed WAL at or after LSN
80 - look at the array of unconsumed WAL LSNs {85, 89}.

With this approach, the WAL is scanned only once as opposed to the
current approach the patch implements.

Thoughts?

[1]
bool
xlog_is_record_decodable(uint8 type)
{
switch (info)
{
case XLOG_CHECKPOINT_SHUTDOWN:
case XLOG_END_OF_RECOVERY:
return true;
case XLOG_CHECKPOINT_ONLINE:
case XLOG_PARAMETER_CHANGE:
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
case XLOG_FPI:
case XLOG_OVERWRITE_CONTRECORD:
return false;
default:
elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
}
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-09-29 07:39:09 pg_resetwal regression: could not upgrade after 1d863c2504
Previous Message Michael Paquier 2023-09-29 07:27:52 Re: Add support for AT LOCAL