Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-09 11:50:21
Message-ID: CAJpy0uAD=QKVXYBSDtbeomRiewpXb_74kc37B3mfZKk87=CN=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 2, 2023 at 4:29 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Shveta,
>
> Thank you for updating the patch!

Thanks for the feedback Kuroda-san. I have addressed most of these in
v22. Please find my comments inline.

>
> I found another ERROR due to the slot removal. Is this a real issue?
>
> 1. applied add_sleep.txt, which emulated the case the tablesync worker stucked
> and the primary crashed during the
> initial sync.
> 2. executed test_0925_v2.sh (You attached in [1])
> 3. secondary could not start the logical replication because the slot was not
> created (log files were also attached).
>
>
> Here is my analysis. The cause is that the slotsync worker aborts the slot creation
> on secondary server because the restart_lsn of secondary ahead the primary's one.
> IIUC it can be occurred when tablesync workers finishes initial copy before
> walsenders stream changes. In this case, the relstate of the worker is set to
> SUBREL_STATE_CATCHUP and the apply worker waits till the relation becomes
> SUBREL_STATE_SYNCDONE. From here the slot on primary will not be updated until
> the relation is caught up. If some changes are come and the primary crashes at
> that time, the syncslot worker will abort the slot creation.
>
>
> Anyway, followings are my comments.
> I have not checked detailed conventions yet. It should be done in later stage.
>
>
> ~~~~~~~~~~~~~~~~
>
> For 0001:
> ===
>
> WalSndWaitForStandbyConfirmation()
>
> ```
> + /* If postmaster asked us to stop, don't wait anymore */
> + if (got_STOPPING)
> + break;
> ```
>
> I have considered again, and it may still have an issue: logical walsenders may
> break from the loop before physical walsenders send WALs. This may be occurred
> because both physical and logical walsenders would get PROCSIG_WALSND_INIT_STOPPING.
>
> I think a function like WalSndWaitStopping() must be needed, which waits until
> physical walsenders become WALSNDSTATE_STOPPING or exit. Thought?
>
> WalSndWaitForStandbyConfirmation()
>
> ```
> + standby_slot_cpy = list_copy(standby_slot_names_list);
> ```
>
> I found that standby_slot_names_list and standby_slot_cpy would not be updated
> even if the GUC was updated. Is this acceptable? Won't it be occurred after you
> refactor the patch?
> What would be occurred when synchronize_slot_names is updated on secondary
> while primary executes this?
>
> WalSndWaitForStandbyConfirmation()
>
> ```
> +
> + goto retry;
> ```

Yes, there could be a problem here. I will review it. Allow some more
time for this.

>
> I checked other "goto retry;", but I could not find the pattern that the return
> clause does not exist after the goto (exception: void function). I also think
> that current style seems a bit strange. How about using an outer loop like
> While (list_length(standby_slot_cpy))?
>
> =====
>
> slot.h
>
> ```
> +extern void WaitForStandbyLSN(XLogRecPtr wait_for_lsn);
> ```
>
> WaitForStandbyLSN() does not exist.
>

Done.

> ~~~~~~~~~~~~~~~~
>
> For 0002:
> =====
>
> General
>
> The patch requires that primary_conninfo must contain the dbname, but it
> conflicts with documentation. It says:
>
> ```
> ...Do not specify a database name in the primary_conninfo string.
> ```
>
> I confirmed [^a] it is harmless that primary_conninfo has dbname, but at least
> the description must be fixed.
>

Done.

> General
>
> I found that primary server output huge amount of logs when the log_min_duration_messages = 0.
> This ie because slotsync worker sends an SQL per 10ms, in wait_for_primary_slot_catchup().
> Is there any good way to suppress it? Or, should we be patient?
>

I will review it to see if we can do anything here.

> =====
>
> ```
> +{ oid => '6312', descr => 'what caused the replication slot to become invalid',
> ```
>
> How did you determine the oid? IIRC, developping features should use oids in
> the range 8000-9999. See src/include/catalog/unused_oids.
>

Corrected it.

> =====
>
> LogicalRepCtxStruct
>
> ```
> /* Background workers. */
> + SlotSyncWorker *ss_workers; /* slot-sync workers */
> LogicalRepWorker workers[FLEXIBLE_ARRAY_MEMBER];
> ```
>
> It's OK for now, but can we combine them into an array? IIUC there is no
> possibility to exist both of processes and they have same component, so it may
> be able to be same. It can reduce an attribute but may lead some
> difficulties to read.

I feel it will add to more confusion and should be kept separate.

>
> WaitForReplicationWorkerAttach() and logicalrep_worker_stop_internal()
>
> I could not find cases that has "LWLock *" as an argument (exception: functions in lwlock.c).
> Is it sufficient to check RecoveryInProgress() instead of specifying as arguments?
>

I feel it should be argument based. If not lock-based then a different
arg perhaps. Let us say it is in the process of starting a worker and
it failed and now it wants to do cleanup. It should do the cleanup of
the worker it attempted to start instead of doing it based on
'RecoveryInProgress'. Latter's value may change if standby is promoted
in between resulting in an attempt to do cleanup of the wrong type of
worker.

> =====
>
> wait_for_primary_slot_catchup()
>
> ```
> + /* Check if this standby is promoted while we are waiting */
> + if (!RecoveryInProgress())
> + {
> + /*
> + * The remote slot didn't pass the locally reserved position at
> + * the time of local promotion, so it's not safe to use.
> + */
> + ereport(
> + WARNING,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg(
> + "slot-sync wait for slot %s interrupted by promotion, "
> + "slot creation aborted", remote_slot->name)));
> + pfree(cmd.data);
> + return false;
> + }
> ```
>
> The part would not be executed if the promote signal is sent after the primary
> server crashes. I think walrcv_exec() will detect the failure first.
> The function must be wrapped by PG_TRY() and the message must be emitted in
> PG_CATCH(). There may be other approaches.

walrcv_exec() may fail because of other reasons too. So generalising
it to failure msg due to a promotion might not be the correct thing to
do. We check if standby is promoted just before walrcv_exec(), so I
feel that should suffice.

>
> wait_for_primary_slot_catchup()
>
> ```
> + rc = WaitLatch(MyLatch,
> + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> + WORKER_DEFAULT_NAPTIME_MS,
> + WAIT_EVENT_REPL_SLOTSYNC_MAIN);
> ```
>
> New wait event can be added.

Done.

>
>
> [1]: https://www.postgresql.org/message-id/CAJpy0uDD%2B9aJnDx9fBfvLvxJtxA7qqoAys4fo6h1tq1b_0_A7Q%40mail.gmail.com
> [^a]
>
> Regarding the secondary side, the libpqrcv_connect() does not do special things
> even if the primary_conninfo has dbname="XXX". It adds parameters like
> "replication=true" and sends a startup packet.
>
> As for the primary side, the startup packet is consumed in ProcessStartupPacket().
> It checks whether the process should be a walsender or not (line 2204).
>
> Then (line 2290) the port->database_name[0] is set as '\0' in case of walsender.
> The value is used for setting the process title in BackendInitialize().
>
> Also, InitPostgres() really sets some global variables like MyDatabaseId,
> but it is not occurred when the process is walsender.
>

This is expected behaviour. The presence of dbname in primary_conninfo
should not affect physical streaming connection while it should only
be used for slotsync worker's connection. That is the case currently.
When logical=false (i..e for physical streaming), we ignore dbname
during connection and when logical=true(slotsync connection), we
consider using it.

> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-10-09 13:04:39 Re: Draft LIMIT pushdown to Append and MergeAppend patch
Previous Message David Rowley 2023-10-09 11:35:38 Re: Crash in add_paths_to_append_rel