Re: Skipping logical replication transactions on subscriber side

From: Alexey Lesovsky <lesovsky(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-07-05 10:33:42
Message-ID: CAGnetYfO0AEgLmSPrxRuhq3ykDH5PGsTeTV2xsbZEXnMvzah2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Have a few notes about pg_stat_logical_replication_error from the DBA point
of view (which will use this view in the future).
1. As I understand it, this view might contain many errors related to
different subscriptions. It is better to name
"pg_stat_logical_replication_errors" using the plural form (like this done
for stat views for tables, indexes, functions). Also, I'd like to suggest
thinking twice about the view name (and function used in view DDL) -
"pg_stat_logical_replication_error" contains very common "logical
replication" words, but the view contains errors related to subscriptions
only. In the future there could be other kinds of errors related to logical
replication, but not related to subscriptions - what will you do?
2. Add a field with database name or id - it helps to quickly understand to
which database the subscription belongs.
3. Add a counter field with total number of errors - it helps to calculate
errors rates and aggregations (sum), and don't lose information about
errors between view checks.
4. Add text of last error (if it will not be too expensive).
5. Rename the "action" field to "command", as I know this is right from
terminology point of view.

Finally, the view might seems like this:

postgres(1:25250)=# select * from pg_stat_logical_replication_errors;
subname | datid | relid | command | xid | total | last_failure |
last_failure_text
----------+--------+-------+---------+-----+-------+-------------------------------+---------------------------
sub_1 | 12345 | 16384 | INSERT | 736 | 145 | 2021-06-27 12:12:45.142675+09
| something goes wrong...
sub_2 | 12346 | 16458 | UPDATE | 845 | 12 | 2021-06-27 12:16:01.458752+09 |
hmm, something goes wrong

Regards, Alexey

On Mon, Jul 5, 2021 at 2:59 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >
> > On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> > >
> > > > Now, if this function is used by super
> > > > users then we can probably trust that they provide the XIDs that we
> > > > can trust to be skipped but OTOH making a restriction to allow these
> > > > functions to be used by superusers might restrict the usage of this
> > > > repair tool.
> > >
> > > If we specify the subscription id or name, maybe we can allow also the
> > > owner of subscription to do that operation?
> >
> > Ah, the owner of the subscription must be superuser.
>
> I've attached PoC patches.
>
> 0001 patch introduces the ability to skip transactions on the
> subscriber side. We can specify XID to the subscription by like ALTER
> SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
> seems straightforward except for setting origin state. After skipping
> the transaction we have to update the session origin state so that we
> can start streaming the transaction next to the one that we just
> skipped in case of the server crash or restarting the apply worker. We
> set origin state to the commit WAL record. However, since we skip all
> changes we don’t write any WAL even if we call CommitTransaction() at
> the end of the skipped transaction. So the patch sets the origin state
> to the transaction that updates the pg_subscription system catalog to
> reset the skip XID. I think we need a discussion of this part.
>
> With 0002 and 0003 patches, we report the error information in server
> logs and the stats view, respectively. 0002 patch adds errcontext for
> messages that happened during applying the changes:
>
> ERROR: duplicate key value violates unique constraint "hoge_pkey"
> DETAIL: Key (c)=(1) already exists.
> CONTEXT: during apply of "INSERT" for relation "public.hoge" in
> transaction with xid 736 committs 2021-06-27 12:12:30.053887+09
>
> 0003 patch adds pg_stat_logical_replication_error statistics view
> discussed on another thread[1]. The apply worker sends the error
> information to the stats collector if an error happens during applying
> changes. We can check those errors as follow:
>
> postgres(1:25250)=# select * from pg_stat_logical_replication_error;
> subname | relid | action | xid | last_failure
> ----------+-------+--------+-----+-------------------------------
> test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
> (1 row)
>
> I added only columns required for the skipping transaction feature to
> the view for now.
>
> Please note that those patches are meant to evaluate the concept we've
> discussed so far. Those don't have the doc update yet.
>
> Regards,
>
> [1]
> https://www.postgresql.org/message-id/DB35438F-9356-4841-89A0-412709EBD3AB%40enterprisedb.com
>
>
> --
> Masahiko Sawada
> EDB: https://www.enterprisedb.com/
>

--
С уважением Алексей В. Лесовский

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2021-07-05 10:48:01 Re: [PATCH] Hooks at XactCommand level
Previous Message Amit Kapila 2021-07-05 10:24:57 Re: Skipping logical replication transactions on subscriber side