Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

From: Priya V <mailme0216(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Date: 2025-08-05 17:01:19
Message-ID: CAFsZ43xFxjSiONwRccXBQXZrPRd+Lh7XAkSVEG1ai165xPcoDA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Postgres community,

We operate a large PostgreSQL fleet (~15,000 databases) on dedicated Linux
hosts.
Each host runs *multiple PostgreSQL instances* (multi-instance setup, not
just multiple DBs inside one instance).

*Environment:*

-

*PostgreSQL Versions:* Mix of 13.13 and 15.12 (upgrades in progress to
be at 15.12 currently both are actively in use)
-

*OS / Kernel:* RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range
-

*RAM:* 256 GiB (varies slightly)
-

*Swap:* Currently none
-

*Workload:* Highly mixed — OLTP-style internal apps with unpredictable
query patterns and connection counts
-

*Goal:* Uniform, safe memory settings across the fleet to avoid kernel
or database instability

We’re reviewing vm.overcommit_* settings because we’ve seen conflicting
guidance:

-

vm.overcommit_memory = 2 gives predictability but can reject allocations
early
-

vm.overcommit_memory = 1 is more flexible but risks OOM kills if many
backends hit peak memory usage at once

We’re considering:

-

*vm.overcommit_memory = 2* for strict accounting
-

Increasing vm.overcommit_ratio from 50 → 80 or 90 to better reflect
actual PostgreSQL usage (e.g., work_mem reservations that aren’t fully
used)

*Our questions for those running large PostgreSQL fleets:*

1.

What overcommit_ratio do you find safe for PostgreSQL without causing
kernel memory crunches?
2.

Do you prefer overcommit_memory = 1 or = 2 for production stability?
3.

How much swap (if any) do you keep in large-memory servers where
PostgreSQL is the primary workload? Is having swap configured a good idea
or not ?
4.

Any real-world cases where kernel accounting was too strict or too loose
for PostgreSQL?
5. What settings to go with if we are not planning on using swap ?

We’d like to avoid both extremes:

-

Too low a ratio → PostgreSQL backends failing allocations even with free
RAM
-

Too high a ratio → OOM killer terminating PostgreSQL under load spikes

Any operational experiences, tuning recommendations, or kernel/PG
interaction pitfalls would be very helpful.

TIA

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joe Conway 2025-08-05 18:52:25 Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Previous Message Pavel Stehule 2025-08-05 05:30:18 Re: proposal: schema variables