RE: Add logical_decoding_spill_limit to cap spill file disk usage per slot

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'shawn wang' <shawn(dot)wang(dot)pg(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Add logical_decoding_spill_limit to cap spill file disk usage per slot
Date: 2026-03-26 05:33:20
Message-ID: OS9PR01MB1214972CB037BE2C6F1DE7CB2F556A@OS9PR01MB12149.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Shawn,

> We operate a fleet of PostgreSQL instances with logical replication. On several
> occasions, we have experienced production incidents where logical decoding spill
> files (pg_replslot/<slot>/xid-*.spill) grew uncontrollably — consuming tens of
> gigabytes and eventually filling up the data disk. This caused the entire instance
> to go read-only, impacting not just replication but all write workloads.

We have provided the subscription option streaming=parallel since PG16. It
replicates on-going transactions and applies immediately. Does it avoid the
issue?

> 5. Behavior on limit exceeded: An ERROR is raised with ERRCODE_CONFIGURATION_LIMIT_EXCEEDED.
> The walsender exits, but the slot's restart_lsn and confirmed_flush are preserved.
> The subscriber can reconnect after the DBA:

Not sure, but doesn't it mean the error is repeating till the GUC is increased?
Also, is there any difference for the slots's behavior, with the normal walsender's
exit case?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shinya Kato 2026-03-26 05:40:37 Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Previous Message Pavel Stehule 2026-03-26 05:18:17 Re: proposal: schema variables