Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-11-16 06:31:18
Message-ID: CAD21AoA5jupM6O=pYsyfaxQ1aMX-en8=QNgpW6KfXsg7_CS0CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 15, 2021 at 11:43 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > I've attached an updated patch that incorporates all comments I got so
> > > > far. Please review it.
> > > >
> > >
> > > Thanks for the updated patch.
> > > A few minor comments:
> > >
> > > doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> > >
> > > (1) tab in doc updates
> > >
> > > There's a tab before "Otherwise,":
> > >
> > > + copy of the relation with <parameter>relid</parameter>.
> > > Otherwise,
> >
> > Fixed.
> >
> > >
> > > src/backend/utils/adt/pgstatfuncs.c
> > >
> > > (2) The function comment for "pg_stat_reset_subscription_worker_sub"
> > > seems a bit long and I expected it to be multi-line (did you run
> > > pg_indent?)
> >
> > I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.
> >
> > >
> > > src/include/pgstat.h
> > >
> > > (3) Remove PgStat_StatSubWorkerEntry.dbid?
> > >
> > > The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
> > > seem to be used, so I think it should be removed.
> > > (I could remove it and everything builds OK and tests pass).
> > >
> >
> > Fixed.
> >
> > Thank you for the comments! I've updated an updated version patch.
>
> Thanks for the updated patch.
> I found one issue:
> This Assert can fail in few cases:
> +void
> +pgstat_report_subworker_error(Oid subid, Oid subrelid, Oid relid,
> +
> LogicalRepMsgType command, TransactionId xid,
> + const char *errmsg)
> +{
> + PgStat_MsgSubWorkerError msg;
> + int len;
> +
> + Assert(strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN);
> + len = offsetof(PgStat_MsgSubWorkerError, m_message[0]) +
> strlen(errmsg) + 1;
> +
>
> I could reproduce the problem with the following scenario:
> Publisher:
> create table t1 (c1 varchar);
> create publication pub1 for table t1;
> insert into t1 values(repeat('abcd', 5000));
>
> Subscriber:
> create table t1(c1 smallint);
> create subscription sub1 connection 'dbname=postgres port=5432'
> publication pub1 with ( two_phase = true);
> postgres=# select * from pg_stat_subscription_workers;
> WARNING: terminating connection because of crash of another server process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Subscriber logs:
> 2021-11-15 19:27:56.380 IST [15685] LOG: logical replication apply
> worker for subscription "sub1" has started
> 2021-11-15 19:27:56.384 IST [15687] LOG: logical replication table
> synchronization worker for subscription "sub1", table "t1" has started
> TRAP: FailedAssertion("strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN",
> File: "pgstat.c", Line: 1946, PID: 15687)
> postgres: logical replication worker for subscription 16387 sync 16384
> (ExceptionalCondition+0xd0)[0x55a18f3c727f]
> postgres: logical replication worker for subscription 16387 sync 16384
> (pgstat_report_subworker_error+0x7a)[0x55a18f126417]
> postgres: logical replication worker for subscription 16387 sync 16384
> (ApplyWorkerMain+0x493)[0x55a18f176611]
> postgres: logical replication worker for subscription 16387 sync 16384
> (StartBackgroundWorker+0x23c)[0x55a18f11f7e2]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x54efc0)[0x55a18f134fc0]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x54f3af)[0x55a18f1353af]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x54e338)[0x55a18f134338]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7feef84371f0]
> /lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7feef81e3ac7]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x5498c2)[0x55a18f12f8c2]
> postgres: logical replication worker for subscription 16387 sync 16384
> (PostmasterMain+0x134c)[0x55a18f12f1dd]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x43c3d4)[0x55a18f0223d4]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7feef80fd565]
> postgres: logical replication worker for subscription 16387 sync 16384
> (_start+0x2e)[0x55a18ecaf4fe]
> 2021-11-15 19:27:56.483 IST [15645] LOG: background worker "logical
> replication worker" (PID 15687) was terminated by signal 6: Aborted
> 2021-11-15 19:27:56.483 IST [15645] LOG: terminating any other active
> server processes
> 2021-11-15 19:27:56.485 IST [15645] LOG: all server processes
> terminated; reinitializing
>
> Here it fails because of a long error message ""invalid input syntax
> for type smallint:

Good catch!

> \"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabc...."
> because we try to insert varchar type data into smallint type. Maybe
> we should trim the error message in this case.

Right. I've fixed this issue and attached an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
v23-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch application/octet-stream 54.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-11-16 07:08:54 Re: Unnecessary global variable declared in xlog.c
Previous Message Michael Paquier 2021-11-16 06:12:38 Re: Unnecessary global variable declared in xlog.c