RE: logical replication restrictions

From: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Subject: RE: logical replication restrictions
Date: 2022-09-14 12:26:52
Message-ID: TYAPR01MB5866F9716A18DA0C68A2CDB3F5469@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry for noise but I found another bug.
When the 032_apply_delay.pl is modified like following,
the test will be always failed even if my patch is applied.

```
# Disable subscription. worker should die immediately.
-$node_subscriber->safe_psql('postgres',
- "ALTER SUBSCRIPTION tap_sub DISABLE"
+$node_subscriber->safe_psql('postgres', q{
+BEGIN;
+ALTER SUBSCRIPTION tap_sub DISABLE;
+SELECT pg_sleep(1);
+COMMIT;
+}
);
```

The point of failure is same as I reported previously.

```
...
2022-09-14 12:00:48.891 UTC [11330] 032_apply_delay.pl LOG: statement: ALTER SUBSCRIPTION tap_sub SET (min_apply_delay = 86460000)
2022-09-14 12:00:48.910 UTC [11226] DEBUG: sending feedback (force 0) to recv 0/1690220, write 0/1690220, flush 0/1690220
2022-09-14 12:00:48.937 UTC [11208] DEBUG: server process (PID 11328) exited with exit code 0
2022-09-14 12:00:48.950 UTC [11226] DEBUG: logical replication apply delay: 86459996 ms
2022-09-14 12:00:48.950 UTC [11226] CONTEXT: processing remote data for replication origin "pg_16393" during "BEGIN" in transaction 734 finished at 0/16902A8
2022-09-14 12:00:48.979 UTC [11208] DEBUG: forked new backend, pid=11334 socket=6
2022-09-14 12:00:49.007 UTC [11334] 032_apply_delay.pl LOG: statement: BEGIN;
2022-09-14 12:00:49.008 UTC [11334] 032_apply_delay.pl LOG: statement: ALTER SUBSCRIPTION tap_sub DISABLE;
2022-09-14 12:00:49.009 UTC [11334] 032_apply_delay.pl LOG: statement: SELECT pg_sleep(1);
2022-09-14 12:00:49.009 UTC [11226] DEBUG: check status of MySubscription
2022-09-14 12:00:49.009 UTC [11226] CONTEXT: processing remote data for replication origin "pg_16393" during "BEGIN" in transaction 734 finished at 0/16902A8
2022-09-14 12:00:49.009 UTC [11226] DEBUG: logical replication apply delay: 86459937 ms
2022-09-14 12:00:49.009 UTC [11226] CONTEXT: processing remote data for replication origin "pg_16393" during "BEGIN" in transaction 734 finished at 0/16902A8
...
```

I think it may be caused that waken worker read catalogs that have not modified yet.
In AlterSubscription(), the backend kicks the apply worker ASAP, but it should be at
end of the transaction, like ApplyLauncherWakeupAtCommit() and AtEOXact_ApplyLauncher().

```
+ /*
+ * If this subscription has been disabled and it has an apply
+ * delay set, wake up the logical replication worker to finish
+ * it as soon as possible.
+ */
+ if (!opts.enabled && sub->applydelay > 0)
+ logicalrep_worker_wakeup(sub->oid, InvalidOid);
+
```

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2022-09-14 13:20:09 Re: Query Jumbling for CALL and SET utility statements
Previous Message Önder Kalacı 2022-09-14 12:04:00 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher