| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Fix race during concurrent logical decoding activation |
| Date: | 2026-05-29 03:38:40 |
| Message-ID: | 695377C3-80FA-4967-B07C-D4E84F4C039C@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On May 29, 2026, at 05:53, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, May 28, 2026 at 2:09 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> While testing “Toggle logical decoding dynamically based on logical slot presence”, I hit an assertion failure with concurrent logical slot creation.
>>
>> This is a repo:
>>
>> 1. In session 1, attach the injection point locally and start creating a logical slot. The session blocks at logical-decoding-activation:
>> ```
>> evantest=# set application_name = 'slot_a';
>> SET
>> evantest=# select injection_points_set_local();
>> injection_points_set_local
>> ----------------------------
>>
>> (1 row)
>> evantest=# select injection_points_attach('logical-decoding-activation', 'wait');
>> injection_points_attach
>> -------------------------
>>
>> (1 row)
>> evantest=# select pg_create_logical_replication_slot('slot_a', 'pgoutput');
>> ```
>>
>> 2. In session 2, create another logical slot. This succeeds, and effective_wal_level becomes logical:
>> ```
>> evantest=# select pg_create_logical_replication_slot('slot_b', 'pgoutput');
>> pg_create_logical_replication_slot
>> ------------------------------------
>> (slot_b,0/0902E418)
>> (1 row)
>>
>> evantest=# show effective_wal_level;
>> effective_wal_level
>> ---------------------
>> logical
>> (1 row)
>> ```
>>
>> 3. In session 2, cancel session 1 instead of waking it up:
>> ```
>> evantest=# select pg_cancel_backend(pid) from pg_stat_activity where application_name = 'slot_a';
>> pg_cancel_backend
>> -------------------
>> t
>> (1 row)
>> ```
>>
>> Then the server hits this assertion:
>
> Thank you for the report! I've confirmed the problem.
>
>>
>> I might be over thinking, but I just feel the safest fix is to make EnableLogicalDecoding() serialize. I tried serializing with LogicalDecodingControlLock and with a separate lock, but both approaches got deadlock around the barrier wait. I ended up with adding an activation_in_progress flag in shared memory, protected by LogicalDecodingControlLock, with a condition variable to wait for the active activation to finish.
>>
>> With this fix, rerunning the repro makes session 2 wait while session 1 is blocked at the injection point. After canceling session 1 from session 3, session 2 continues, creates the slot successfully, and effective_wal_level becomes logical.
>
> This serialization idea is very similar to what we tried in older
> version patches. While I've confirmed that the patch fixes the
> reported issue, I'm somewhat concerned that it could introduce another
> race condition as the patch adds a new mechanism.
>
> I think that the complication stems from the fact that
> abort_logical_decoding_activation() and DisableLogicalDecoding() clear
> the xlog_logical_info flag in different ways. I guess it would be
> simpler if we could delegate all the deactivation process including
> clearing xlog_logical_info flag to DisableLogicalDecoding(). It would
> allow the system to be in the state of xlog_logical_info == true and
> logical_decoding_enabled = false, but if we change
> DisableLogicalDecoding() to handle this case, the deactivation process
> would become simpler.
>
>> I didn’t include a test in this patch, as I wasn’t sure such a test would be desirable. If others think it is worth adding, I can convert the repro into a TAP test.
>
> I think we should have the tests.
>
> I've attached the patch for the above idea including the regression
> tests. Please review it.
>
Yeah, your solution of deferring the actual abort to DisableLogicalDecoding() is better. I have a few comments on the new patch:
1
```
+ else
+ {
+ /*
+ * Logical decoding is disabled while xlog_logical_info is true.
+ * This could happen if an activation is interrupted. Reset only
+ * xlog_logical_info.
+ */
+ LogicalDecodingCtl->xlog_logical_info = false;
+ LogicalDecodingCtl->logical_decoding_enabled = false;
+ }
```
Here, we should also clear LogicalDecodingCtl->pending_disable = false;
2. AKAIK, critical section only works with elog, thus simple shared memory assignments don't have to be inside the critical section. So we can move out LogicalDecodingCtl assignments to outside the critical section, which avoids some duplicate code.
3
```
+ if (!in_recovery && was_enabled)
ereport(LOG,
errmsg("logical decoding is disabled because there are no valid logical replication slots"));
```
I think this log message can be emitted after releasing the lock, which makes the locking period a bit shorter. That might delay the log output slightly, but since slot creation is not a frequent operation, I think that should be fine.
4
```
+ test_wal_level($primary, "replica|logical",
+ "the concurrent activation interrupt is handled property");
```
Typo: property -> properly
5. In the test, we may add more checks. For example, after the second slot is created, we can check that the slot is created and “effective_wal_level" has been set to “logical", which proves the concurrent creation is not blocked. After cancelling, we can check that the cancellation happened and that “effective_wal_level" is still “logical".
I addressed all of these comments in the attached diff. Please take a look.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| Attachment | Content-Type | Size |
|---|---|---|
| fix_race_condition_2.diff | application/octet-stream | 6.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hayato Kuroda (Fujitsu) | 2026-05-29 03:44:22 | RE: ECPG: inconsistent behavior with the document in “GET/SET DESCRIPTOR.” |
| Previous Message | 彭冲 | 2026-05-29 03:30:28 | Re: First draft of PG 19 release notes |