From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
Subject: | Re: Conflict detection for update_deleted in logical replication |
Date: | 2025-08-18 05:05:57 |
Message-ID: | CAA4eK1+P2vHmF+KLVQcfwuTbHrrwG9uonMKfh+c36jGZH=LXZw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Aug 16, 2025 at 5:15 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Regarding the subscription-level option vs. GUC, I don't disagree with
> the current approach.
>
> For the record, while I agree that the subscription-level option is
> more consistent with the existing retain_dead_tuples option and can
> work for different requirements, my biggest concern is that if users
> set different values to different subscriptions, they might think it
> doesn't work as expected. This is because even if a subscription
> disables retaining dead tuples due to max_conflict_retention_duration,
> the database cluster doesn't stop retaining dead tuples unless all
> other subscriptions disable it, meaning that the performance on that
> database might not get recovered. I proposed the GUC parameter as I
> thought it's less confusing from a user perspective. I'm not sure it's
> sufficient to mention that in the documentation but I don't have a
> better idea.
>
I think we might want to gave a GUC as well in the future as both
(subscription option and GUC) can be used in different scenarios but
better to wait for some user feedback before going that far. We can
document this in the option to make users aware how to use it in such
situations.
>
> ---
> +static void
> +notify_ineffective_max_conflict_retention(bool update_maxconflretention)
> +{
> + ereport(NOTICE,
> + errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + update_maxconflretention
> + ? errmsg("max_conflict_retention_duration has no effect
> when retain_dead_tuples is disabled")
> + : errmsg("disabling retain_dead_tuples will render
> max_conflict_retention_duration ineffective"));
> +}
>
> Given that max_conflict_retention_duration works only when
> retain_dead_tuples is enabled, why not merge these two parameters? For
> example, setting max_conflict_retention_duration=-1 means to disable
> retaining dead tuples behavior and =0 means that dead tuples are
> retained until they are no longer needed for detection purposes.
>
I think it can be a source of confusion for users, if not now, then in
the future. Consider that in the future, we also have a GUC to set the
retention duration which will be used for all subscriptions. Now, say,
we define the behaviour such that if this value is set for
subscription, then use that, otherwise, use the GUC value. Then, with
this proposal, if the user sets max_conflict_retention_duration=0, it
will lead to retaining tuples until they are no longer needed but with
the behaviour proposed in patch, one could have simply set
retain_dead_tuples=true and used the GUC value. I understand that it
is debatable how we will design the GUC behaviour in future but I used
it as an example how trying to achieve different things with one
option can be a source of confusion. Even if we decide not to
introduce GUC or define its behaviour differently, I find having
different options in this case is easy to understand and use.
> ---
> + /*
> + * Only the leader apply worker manages conflict retention (see
> + * maybe_advance_nonremovable_xid() for details).
> + */
> + if (!isParallelApplyWorker(&worker) && !isTablesyncWorker(&worker))
> + values[10] = TransactionIdIsValid(worker.oldest_nonremovable_xid);
> + else
> + nulls[10] = true;
>
> I think that adding a new column to the pg_stat_subscription view
> should be implemented in a separate patch since it needs to bump the
> catalog version while introducing max_conflict_retention_duration
> subscription option doesn't require that.
>
Won't following change in pg_subsciption.h anyway requires us to bump
catversion?
@@ -81,6 +81,10 @@
CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
bool subretaindeadtuples; /* True if dead tuples useful for
* conflict detection are retained */
+ int32 maxconflretention; /* The maximum duration (in milliseconds)
+ * for which information useful for
+ * conflict detection can be retained */
+
> ---
> Even if an apply worker disables retaining dead tuples due to
> max_conflict_retention_duration, it enables again after the server
> restarts.
>
I also find this behaviour questionable because this also means that
it is possible that before restart one would deduce that the
update_deleted conflict won't be reliably detected for a particular
subscription but after restart it could lead to the opposite
conclusion. But note that to make it behave similarly we need to store
this value persistently in pg_subscription unless you have better
ideas for this. Theoretically, there are two places where we can
persist this information, one is with pg_subscription, and other in
origin. I find it is closer to pg_subscription.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-18 05:06:19 | Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options |
Previous Message | Michael Paquier | 2025-08-18 04:49:32 | Re: Sequence Access Methods, round two |