Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot

From: shawn wang <shawn(dot)wang(dot)pg(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot
Date: 2026-04-03 16:14:34
Message-ID: CA+T=_GVx9+tZ0UnnkYoOnzg7_0F8SNYutEYULZNskpU_JmUHvA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Bharath,

Thank you for the thorough and thoughtful review!

> Having a lot of spill files also increases crash/recovery times.
> However, files spilling to disk causing no-space-left-on-disk issues
> leading to downtime applies to WAL files, historical catalog snapshot
> files, subtransaction overflow files, CLOG (and all the subsystems
> backed by SLRU data structure), etc. - basically any Postgres
> subsystem writing files to disk. I'm a bit worried that we may end up
> solving disk space issues, which IMHO are outside of the database
> scope, in the database. Others may have different opinions though.

That's a fair philosophical point. However, I'd argue that spill files
are uniquely unprotected compared to the other subsystems you mentioned:

- WAL files → max_slot_wal_keep_size
- Temp files → temp_file_limit
- SLRU/CLOG → bounded by design (sized proportionally to active XIDs)

Spill files are the gap: they can grow proportionally to the *data
volume* of in-progress transactions, not to the number of transactions,
and there is no existing upper bound. A single bulk-load transaction
replicated via logical decoding can generate tens of GBs of spill files
with no way for the DBA to cap it.

So this is not about solving a general OS-level disk problem in the
database; it is about providing the same kind of safety net that already
exists for WAL (max_slot_wal_keep_size) and temp files (temp_file_limit)
— a per-subsystem guardrail that the DBA can configure.

> How common is this issue? Could you please add a test case to the
> proposed patch that without this feature would otherwise hit the issue
> described?

Typically triggered by bulk data migrations or long-running DDL on large
tables,
combined with a subscriber that fell behind. Each incident caused the
instance
to go read-only, affecting all workloads — not just replication.

We have also seen reports of this in pgsql-general and on community
forums, though the root cause is not always identified as spill files.

Absolutely. I will add a TAP test that:
1. Creates 10 logical replication slots with test_decoding.
2. Starts a large transaction that generates spill files exceeding
the configured logical_decoding_spill_limit.
3. Verifies that the walsender ERRORs out with the expected
ERRCODE_CONFIGURATION_LIMIT_EXCEEDED.
4. Verifies that spill files are cleaned up after the error.

I will include the 4 in v2.

Test case:
1. build a table:

psql -p 5432 -d postgres -c "
CREATE TABLE spill_load (
id bigint,
payload text
);
"

2. build some slots:
#base

for i in $(seq 1 10); do
psql -p 5432 -d postgres -c \
"SELECT pg_create_logical_replication_slot('spill_slot_${i}',
'test_decoding');"
done

psql -p 5432 -d postgres -c \
"SELECT slot_name, plugin, active FROM pg_replication_slots ORDER BY
slot_name;"

3. start pg_recvlogical:
#bash
mkdir -p /tmp/spill_test/pids

for i in $(seq 1 10); do
pg_recvlogical \
--dbname="port=5432 dbname=postgres" \
--slot="spill_slot_${i}" \
--start \
--no-loop \
--status-interval=0 \
--file=/dev/null &
echo $! > /tmp/spill_test/pids/recv_${i}.pid
echo "Started pg_recvlogical for spill_slot_${i}, PID=$!"
done

sleep 2

psql -p 5432 -d postgres -c \
"SELECT slot_name, active, active_pid
FROM pg_replication_slots
WHERE slot_name LIKE 'spill_slot_%'
ORDER BY slot_name;"

4. make a big transaction:

psql -p 5432 -d postgres -c "
BEGIN;
INSERT INTO spill_load
SELECT i, repeat('X', 100000)
FROM generate_series(1, 10000000) i;
COMMIT;

"

> Having said that, were alternatives like disabling subscriptions when
> seen occupying the disk space considered?

Yes, we considered monitoring-based approaches, but they have inherent
limitations:

1. Reaction latency: Even with 1-second polling, a high-throughput
bulk load can spill gigabytes between two polls. The spill rate
can easily exceed 100 MB/s, so a 5-second monitoring gap means
500 MB of uncontrolled growth.

2. External dependency: Relying on an external tool means the
protection is not available by default. Every deployment must
set up and maintain the monitoring — and if it fails or is
misconfigured, the disk-full scenario still occurs.

3. No subscriber-side control: pg_recvlogical users and third-party
CDC tools (Debezium, etc.) that consume directly from the
publisher have no "subscription" to disable.

4. Disabling a subscription does not immediately stop spilling:
the walsender continues processing WAL records already in flight.

The proposed GUC acts as a synchronous, in-process safety valve — it
checks the limit at the exact point where the next spill write would
occur, with zero latency. Monitoring and the GUC are complementary:
monitoring for alerting, the GUC as a hard backstop.

> When the logical_decoding_spill_limit is exceeded, ERRORing out in the
> walsender is even more problematic, right? The replication slot would
> be inactive, causing bloat and preventing tuple freezing, WAL files
> growth and eventually the system may hit disk-space issues - it is
> like "we avoided disk space issues for one subsystem, but introduced
> it for another". This looks a bit problematic IMHO. Others may have
> different opinions though.

This is a great point and deserves a careful answer.

First, the "inactive slot causes WAL growth" problem already has a
well-established solution: max_slot_wal_keep_size. When this is
configured (which is recommended practice), an inactive slot will be
invalidated once WAL retention exceeds the limit, releasing both WAL
files and catalog bloat. So the combination of:

logical_decoding_spill_limit + max_slot_wal_keep_size

provides defense in depth: the spill limit prevents the data disk from
filling up with spill files, while max_slot_wal_keep_size prevents WAL
disk from growing unboundedly due to an inactive slot.

Second, compare the two failure modes:

(a) WITHOUT spill limit: disk fills up → the ENTIRE instance goes
read-only or shutdown. All workloads stop. Recovery requires manual
intervention (rm spill files, restart). Data loss is possible
if WAL writes also fail.

(b) WITH spill limit: walsender ERRORs → only replication is
interrupted. The primary continues serving all read/write
workloads normally. The DBA receives a clear error message
and can act: increase the limit, increase work_mem, enable
streaming, or investigate the large transaction.

cenario (b) is strictly better than (a): it converts an
instance-wide outage into a replication-only interruption with a
clear, actionable error message.

Additionally, in the updated patch (v2), I have ensured that spill

files are properly cleaned up in the error path (via
WalSndErrorCleanup), so exceeding the limit does not leave orphaned
files on disk.

Best regards,
Shawn

Attachment Content-Type Size
v2-0001-Add-logical_decoding_spill_limit-GUC-to-cap-spill-file-limit.patch application/x-patch 14.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haibo Yan 2026-04-03 16:37:22 Re: Extract numeric filed in JSONB more effectively
Previous Message shawn wang 2026-04-03 16:11:52 Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot