Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-10-27 10:01:49
Message-ID: CAA4eK1KRMY8CSyjURw6b-tnN4MMd5emmYMUdKY1bQ3msjwELyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>
> I've attached updated patches.
>

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

2.
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>count</structfield> <type>uint8</type>
+ </para>
+ <para>
+ Number of consecutive times the error occurred
+ </para></entry>

Shall we name this field as error_count as there will be other fields
in this view in the future that may not be directly related to the
error?

3.
+
+CREATE VIEW pg_stat_subscription_workers AS
+ SELECT
+ e.subid,
+ s.subname,
+ e.subrelid,
+ e.relid,
+ e.command,
+ e.xid,
+ e.count,
+ e.error_message,
+ e.last_error_time,
+ e.stats_reset
+ FROM (SELECT
+ oid as subid,
+ NULL as relid
+ FROM pg_subscription
+ UNION ALL
+ SELECT
+ srsubid as subid,
+ srrelid as relid
+ FROM pg_subscription_rel
+ WHERE srsubstate <> 'r') sr,
+ LATERAL pg_stat_get_subscription_worker(sr.subid, sr.relid) e

It might be better to use 'w' as an alias instead of 'e' as the
information is now not restricted to only errors.

4. +# Test if the error reported on pg_subscription_workers view is expected.

The view name is wrong in the above comment

5.
+# Check if the view doesn't show any entries after dropping the subscriptions.
+$node_subscriber->safe_psql(
+ 'postgres',
+ q[
+DROP SUBSCRIPTION tap_sub;
+DROP SUBSCRIPTION tap_sub_streaming;
+]);
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(1) FROM pg_stat_subscription_workers");
+is($result, q(0), 'no error after dropping subscription');

Don't we need to wait after dropping the subscription and before
checking the view as there might be a slight delay in messages to get
cleared?

7.
+# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
+# infinite error due to violating the unique constraint.
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+ 'postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
application_name=$appname' PUBLICATION tap_pub WITH (streaming = off,
two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+ 'postgres',
+ "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION
'$publisher_connstr application_name=$appname_streaming' PUBLICATION
tap_pub_streaming WITH (streaming = on, two_phase = on);");
+
+$node_publisher->wait_for_catchup($appname);
+$node_publisher->wait_for_catchup($appname_streaming);

How can we ensure that subscriber would have caught up when one of the
tablesync workers is constantly in the error loop? Isn't it possible
that the subscriber didn't send the latest lsn feedback till the table
sync worker is finished?

8.
+# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
+# infinite error due to violating the unique constraint.

The second sentence of the comment can be written as: "The table sync
for test_tab2 on tap_sub will enter into infinite error loop due to
violating the unique constraint."

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-10-27 10:37:55 Re: Skipping logical replication transactions on subscriber side
Previous Message Pavel Stehule 2021-10-27 09:15:31 Re: proposal: possibility to read dumped table's name from file