Re: Skipping logical replication transactions on subscriber side

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2022-01-26 02:01:24
Message-ID: CAKFQuwYJ7dsW+Stsw5+ZVoY3nwQ9j6pPt-7oYjGddH-h7uVb+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 24, 2022 at 12:59 AM David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

>
> > 5(out). wait for the user to manually restart the replication stream
>>
>> Do you mean that there always is user intervention after error so the
>> replication stream can resume?
>>
>
> That is my working assumption. It doesn't seem like the system would
> auto-resume without a DBA doing something (I'll attribute a server crash to
> the DBA for convenience).
>
> Apparently I need to read more about how the system works today to
> understand how this varies from and integrates with today's user experience.
>
>
I've done some code reading. My understanding is that a background worker
for the main apply of a given subscription is created from the launcher
code (not reviewed) which is initialized at server startup (or as needed
sometime thereafter). This goes into a for(;;) loop in LogicalRepApplyLoop
under a PG_TRY in ApplyWorkerMain. When a message is applied that provokes
an error the PG_CATCH() in ApplyWorkerMain takes over and then this worker
dies. While in that PG_CATCH() we have an aborted transaction and so are
limited in what we can change. We PG_RE_THROW(); back to the background
worker infrastructure and let it perform logging and cleanup; which
includes this destroying this instance of the background worker. The
background worker that is destroyed is replaced and its replacement is
identical to the original so far as the statistics collector is concerned.

I haven't traced out when the replacement apply worker gets recreated. It
seems like doing so immediately, and then it going and just encountering
the same error, would be an undesirable choice, and so I've assumed it does
not. But I also wasn't expecting the apply worker to PG_RE_THROW() either,
but instead continue on running in a different for(;;) loop waiting for
some signal from the system that something has changed that may avoid the
error that put it in timeout.

So my more detailed goal would be to get rid of PG_RE_THROW(); (I assume
doing so would entail transaction rollback) and stay in the worker. Update
pg_subscription with the error information (having removed PG_RE_THROW we
have new things to consider re: pg_stat_subscription_workers). Go into a
for(;;) loop, maybe polling pg_subscription for an indication that it is OK
to retry applying the last transaction. (can an inter-process signal be
sent from a normal backend process to a background worker process?). The
SKIP command then matches XID values on pg_subscription; the resumption
sees the subskipxid, updates pg_subscription to remove the error info and
subskipid, skips the next transaction assuming it has the matching XID, and
then continues applying as normal. Adapt to deal with crash conditions as
needed though clearing before reapplying seems like a safe default. Again,
upon worker startup maybe they should be cleared too (making pg_dump and
other backup considerations moot - as noted in my P.S. in the previous
email).

I'm not sure we are paranoid enough regarding the locking of
pg_subscription for purposes of reading and writing subskipxid. I'd
probably rather serialize access to it, and maybe even not allow changing
from one non-zero XID to another non-zero XID. It shouldn't be needed in
practice (moreso if the XID has to be the one that is present from
current_error_xid) and the user can always reset first.

In worker.c I was and still am confused as to the meaning of 'c' and 'w' in
LogicalRepApplyLoop. In apply_dispatch in that file enums are used to
compare against the message byte, it would be helpful for the inexperienced
reader if 'c' and 'w' were done as enums instead as well.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-01-26 02:16:57 Re: logical decoding and replication of sequences
Previous Message Michael Paquier 2022-01-26 02:00:28 Re: pg_upgrade should truncate/remove its logs before running