Re: Avoid retaining conflict-related data when no tables are subscribed

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Avoid retaining conflict-related data when no tables are subscribed
Date: 2025-08-29 04:05:23
Message-ID: CAA4eK1JDXKvKuii8BTjSwgJOHemSD1KGOfcEcJoFwEOfAYtvkw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 28, 2025 at 7:54 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Hi,
>
> My colleague Nisha reported an issue to me off-list: dead tuples can't
> be removed when retain_dead_tuples is enabled for a subscription with no tables.
>
> This appears to stem from the inability to advance the non-removable transaction
> ID when AllTablesyncsReady() returns false. Since this function returns false
> when no tables are present, which leads to unnecessary data retention until a
> table is added to the subscription.
>
> Since dead tuples don't need to be retained when no tables are subscribed, here
> is a patch to fix it, modifying AllTablesyncsReady() to allows no tables to be
> treated as a ready state when explicitly requested.
>

Few comments:
============
Aren't following two paragraphs in comments contradict each other:

   * It is safe to add new tables with initial states to the subscription
   * after this check because any changes applied to these tables should
   * have a WAL position greater than the rdt_data->remote_lsn.
+   *
+   * Advancing the transaction ID is also necessary when no tables are
+   * subscribed, as it prevents unnecessary retention of dead tuples. Although
+   * it seem feasible to skip all phases and directly assign candidate_xid to
+   * oldest_nonremovable_xid in the RDT_GET_CANDIDATE_XID phase when no tables
+   * are currently subscribed, this approach is unsafe. This is because new
+   * tables may be added to the subscription after the initial table check,
+   * requiring tuples deleted before candidate_xid for conflict detection in
+   * upcoming transactions. Therefore, it remains necessary to wait for all
+   * concurrent transactions to be fully applied.
   */

In the first para, the comments say that it is okay to add tables
after this check and in the second para, it says that is not okay?

2.
+ * If the subscription has no tables, return the value determined by
+ * 'ready_if_no_tables'.
+ *
+ * Otherwise, return whether all the tables for the subscription are in the
+ * READY state.
*
* Note: This function is not suitable to be called from outside of apply or
* tablesync workers because MySubscription needs to be already initialized.
*/
bool
-AllTablesyncsReady(void)
+AllTablesyncsReady(bool ready_if_no_tables)

This change serves the purpose but I find it makes the API complex to
understand because now it needs to make decisions based on different
states depending on the boolean parameter passed. Can we introduce a
new API for the empty subscription case?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-08-29 04:07:24 Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt
Previous Message Chao Li 2025-08-29 03:42:31 Re: SQL:2023 JSON simplified accessor support