Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-09-27 00:50:53
Message-ID: CAD21AoAZ76=YB_QyQuDNc-NBdGfQ_zbiee3aw7MUVFFmTZPB6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Sep 24, 2021 at 6:44 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Sep 24, 2021 at 8:01 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > >
> > > 6.
> > > +typedef struct PgStat_StatSubEntry
> > > +{
> > > + Oid subid; /* hash table key */
> > > +
> > > + /*
> > > + * Statistics of errors that occurred during logical replication. While
> > > + * having the hash table for table sync errors we have a separate
> > > + * statistics value for apply error (apply_error), because we can avoid
> > > + * building a nested hash table for table sync errors in the case where
> > > + * there is no table sync error, which is the common case in practice.
> > > + *
> > >
> > > The above comment is not clear to me. Why do you need to have a
> > > separate hash table for table sync errors? And what makes it avoid
> > > building nested hash table?
> >
> > In the previous patch, a subscription stats entry
> > (PgStat_StatSubEntry) had one hash table that had error entries of
> > both apply and table sync. Since a subscription can have one apply
> > worker and multiple table sync workers it makes sense to me to have
> > the subscription entry have a hash table for them.
> >
>
> Sure, but each tablesync worker must have a separate relid. Why can't
> we have a single hash table for both apply and table sync workers
> which are hashed by sub_id + rel_id? For apply worker, the rel_id will
> always be zero (InvalidOId) and tablesync workers will have a unique
> OID for rel_id, so we should be able to uniquely identify each of
> apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
Oid subid; /* hash key */

HTAB *errors; /* apply and table sync errors */

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
PgStat_Counter failure_count;
} PgStat_StatSubEntry;

When a subscription is dropped, we can easily drop the subscription
entry along with those statistics including the errors from the hash
table.

Regards,

[1] https://www.postgresql.org/message-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199%40OSBPR01MB4888.jpnprd01.prod.outlook.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2021-09-27 00:52:09 Re: Evaluate expression at planning time for two more cases
Previous Message Michael Paquier 2021-09-27 00:34:11 Re: can we add some file(msvc) to gitignore