Re: Skipping logical replication transactions on subscriber side

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-11-10 03:49:30
Message-ID: CALDaNm2UzveqbhdrpE_O+cg__iDfzGXJfT61sKYuGnQei6k-fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 7, 2021 at 7:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Nov 3, 2021 at 12:41 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > >
> > > > > > Fair enough. So statistics can be removed either by vacuum or drop
> > > > > > subscription. Also, if we go by this logic then there is no harm in
> > > > > > retaining the stat entries for tablesync errors. Why have different
> > > > > > behavior for apply and tablesync workers?
> > > > >
> > > > > My understanding is that the subscription worker statistics entry
> > > > > corresponds to workers (but not physical workers since the physical
> > > > > process is changed after restarting). So if the worker finishes its
> > > > > jobs, it is no longer necessary to show errors since further problems
> > > > > will not occur after that. Table sync worker’s job finishes when
> > > > > completing table copy (unless table sync is performed again by REFRESH
> > > > > PUBLICATION) whereas apply worker’s job finishes when the subscription
> > > > > is dropped.
> > > > >
> > > >
> > > > Actually, I am not very sure how users can use the old error
> > > > information after we allowed skipping the conflicting xid. Say, if
> > > > they want to add/remove some constraints on the table based on
> > > > previous errors then they might want to refer to errors of both the
> > > > apply worker and table sync worker.
> > >
> > > I think that in general, statistics should be retained as long as a
> > > corresponding object exists on the database, like other cumulative
> > > statistic views. So I’m concerned that an entry of a cumulative stats
> > > view is automatically removed by a non-stats-related function (i.g.,
> > > ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative
> > > stats views.
> > >
> > > We can retain the stats entries for table sync worker but what I want
> > > to avoid is that the view shows many old entries that will never be
> > > updated. I've sometimes seen cases where the user mistakenly restored
> > > table data on the subscriber before creating a subscription, failed
> > > table sync on many tables due to unique violation, and truncated
> > > tables on the subscriber. I think that unlike the stats entries for
> > > apply worker, retaining the stats entries for table sync could be
> > > harmful since it’s likely to be a large amount (even hundreds of
> > > entries). Especially, it could lead to bloat the stats file since it
> > > has an error message. So if we do that, I'd like to provide a function
> > > for users to remove (not reset) stats entries manually.
> > >
> >
> > If we follow the idea of keeping stats at db level (in
> > PgStat_StatDBEntry) as discussed above then I think we already have a
> > way to remove stat entries via pg_stat_reset which removes the stats
> > corresponding to tables, functions and after this patch corresponding
> > to subscriptions as well for the current database. Won't that be
> > sufficient? I see your point but I think it may be better if we keep
> > the same behavior for stats of apply and table sync workers.
>
> Make sense.
>
> >
> > Following the tables, functions, I thought of keeping the name of the
> > reset function similar to "pg_stat_reset_single_table_counters" but I
> > feel the currently used name "pg_stat_reset_subscription_worker" in
> > the patch is better. Do let me know what you think?
>
> Yeah, I also tend to prefer pg_stat_reset_subscription_worker name
> since "single" isn't clear in the context of subscription worker. And
> the behavior of the reset function for subscription workers is also
> different from pg_stat_reset_single_xxx_counters.
>
> I've attached an updated patch. In this version patch, subscription
> worker statistics are collected per-database and handled in a similar
> way to tables and functions. I think perhaps we still need to discuss
> details of how the statistics should be handled but I'd like to share
> the patch for discussion.

Thanks for the updated patch, Few comments:
1) should we change "Tables and functions hashes are initialized to
empty" to "Tables, functions and subworker hashes are initialized to
empty"
+ hash_ctl.keysize = sizeof(PgStat_StatSubWorkerKey);
+ hash_ctl.entrysize = sizeof(PgStat_StatSubWorkerEntry);
+ dbentry->subworkers = hash_create("Per-database subscription worker",
+
PGSTAT_SUBWORKER_HASH_SIZE,
+
&hash_ctl,
+
HASH_ELEM | HASH_BLOBS);

2) Since databaseid, tabhash, funchash and subworkerhash are members
of dbentry, can we remove the function arguments databaseid, tabhash,
funchash and subworkerhash and pass dbentry similar to
pgstat_write_db_statsfile function?
@@ -4370,12 +4582,14 @@ done:
*/
static void
pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
- bool permanent)
+ HTAB *subworkerhash,
bool permanent)
{
PgStat_StatTabEntry *tabentry;
PgStat_StatTabEntry tabbuf;
PgStat_StatFuncEntry funcbuf;
PgStat_StatFuncEntry *funcentry;
+ PgStat_StatSubWorkerEntry subwbuf;
+ PgStat_StatSubWorkerEntry *subwentry;

3) Can we move pgstat_get_subworker_entry below pgstat_get_db_entry
and pgstat_get_tab_entry, so that the hash lookup can be together
consistently. Similarly pgstat_send_subscription_purge can be moved
after pgstat_send_slru.
+/* ----------
+ * pgstat_get_subworker_entry
+ *
+ * Return subscription worker entry with the given subscription OID and
+ * relation OID. If subrelid is InvalidOid, it returns an entry of the
+ * apply worker otherwise of the table sync worker associated with subrelid.
+ * If no subscription entry exists, initialize it, if the create parameter
+ * is true. Else, return NULL.
+ * ----------
+ */
+static PgStat_StatSubWorkerEntry *
+pgstat_get_subworker_entry(PgStat_StatDBEntry *dbentry, Oid subid,
Oid subrelid,
+ bool create)
+{
+ PgStat_StatSubWorkerEntry *subwentry;
+ PgStat_StatSubWorkerKey key;
+ bool found;

4) This change can be removed from pgstat.c:
@@ -332,9 +339,11 @@ static bool pgstat_db_requested(Oid databaseid);
static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData
name, bool create_it);
static void pgstat_reset_replslot(PgStat_StatReplSlotEntry
*slotstats, TimestampTz ts);

+
static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, TimestampTz now);
static void pgstat_send_funcstats(void);

5) I was able to compile without including
catalog/pg_subscription_rel.h, we can remove including
catalog/pg_subscription_rel.h if not required.
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -41,6 +41,8 @@
#include "catalog/catalog.h"
#include "catalog/pg_database.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_subscription.h"
+#include "catalog/pg_subscription_rel.h"

6) Similarly replication/logicalproto.h also need not be included
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
#include "pgstat.h"
#include "postmaster/bgworker_internals.h"
#include "postmaster/postmaster.h"
+#include "replication/logicalproto.h"
#include "replication/slot.h"
#include "storage/proc.h"

7) There is an extra ";", We can remove one ";" from below:
+ PgStat_StatSubWorkerKey key;
+ bool found;
+ HASHACTION action = (create ? HASH_ENTER : HASH_FIND);;
+
+ key.subid = subid;
+ key.subrelid = subrelid;

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-11-10 04:22:50 Re: Optionally automatically disable logical replication subscriptions on error
Previous Message Noah Misch 2021-11-10 03:39:11 Re: 2021-11-11 release announcement draft