| From: | shawn wang <shawn(dot)wang(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
| Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot |
| Date: | 2026-04-03 16:11:52 |
| Message-ID: | CA+T=_GUf17BqsLRUM36c_=h4hOcS9fYDMYWdRZL3ALL_M88GGA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Kuroda,
Thank you for the review and the great questions!
> We have provided the subscription option streaming=parallel since PG16. It
> replicates on-going transactions and applies immediately. Does it avoid
the
> issue?
streaming=parallel does significantly reduce publisher-side spill files
in the common case — when enabled, the reorder buffer streams changes
directly instead of spilling to disk.
However, it cannot guarantee 100% avoidance of spilling. There are
several fallback scenarios in the code where streaming is not possible
and the reorder buffer falls back to spill-to-disk even when
streaming=parallel is configured:
1. Snapshot not yet consistent (snapbuild.c — SnapBuildCurrentState()
< SNAPBUILD_CONSISTENT), e.g. right after slot creation.
2. Transaction is being re-decoded after a restart
(SnapBuildXactNeedsSkip() returns true).
3. Transaction contains TOAST partial changes
(rbtxn_has_partial_change), which cannot be streamed.
4. Transaction contains speculative inserts (INSERT ... ON CONFLICT),
also flagged as partial changes.
5. Transaction has no streamable changes yet
(!rbtxn_has_streamable_change).
6. Output plugin does not support streaming callbacks
(e.g. test_decoding without the streaming option).
7. Parallel apply worker is busy for >10 seconds — the leader falls
back to serializing changes to disk
(applyparallelworker.c, SHM_SEND_TIMEOUT_MS).
8. No parallel worker available — the leader serializes the entire
streamed transaction to disk (worker.c,
get_transaction_apply_action → TRANS_LEADER_SERIALIZE).
Additionally, streaming is a *subscription-level* parameter that only
applies to built-in logical replication. Users of pg_recvlogical or
third-party CDC tools (Debezium, etc.) consume changes directly from
the publisher's walsender and have no subscription to configure.
So streaming=parallel and logical_decoding_spill_limit are
complementary: streaming reduces spilling in the common case, while
the spill limit provides a hard safety net for the cases where
spilling is unavoidable.
> Not sure, but doesn't it mean the error is repeating till the GUC is
increased?
Good question. Yes, if the same large transaction is re-decoded
without any configuration change, the same ERROR will occur again.
This is intentional — the behavior is analogous to temp_file_limit:
once the limit is hit, the operation fails, and it will keep failing
until the DBA takes action.
The DBA has several options to resolve it:
- Increase logical_decoding_spill_limit.
- Increase logical_decoding_work_mem (so less data is spilled).
- Enable streaming on the subscriber (streaming=on or
streaming=parallel), which avoids spilling in most cases.
- Investigate and address the root cause (e.g. break up the
large transaction).
The ERROR message includes the current spill size and the configured
limit, making it straightforward to diagnose.
> Also, is there any difference for the slots's behavior, with the normal
walsender's
> exit case?
No, the slot behavior is the same as a normal walsender exit.
Specifically:
- The slot remains valid (it is NOT invalidated).
- restart_lsn and confirmed_flush are preserved.
- The subscriber can reconnect and resume from where it left off.
- In v2, spill files are properly cleaned up in the error path
(via WalSndErrorCleanup), so no orphaned files are left behind.
The only difference is that the walsender's exit reason is logged as
an ERROR with ERRCODE_CONFIGURATION_LIMIT_EXCEEDED, rather than a
normal shutdown. The slot itself is in exactly the same state as if
the walsender had exited normally or the connection was dropped.
Best regards,
Shawn
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shawn wang | 2026-04-03 16:14:34 | Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot |
| Previous Message | Jacob Champion | 2026-04-03 16:09:25 | Re: MinGW CI tasks fail / timeout |