Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

From: Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Priya V <mailme0216(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Date: 2025-08-06 08:47:29
Message-ID: 5520EFB7-EDD1-42C1-8141-63EED777E026@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Joe,

Can you name any technical reason why not having swap for a database is an actual bad idea?

Memory always is limited. Swap was invented to overcome a situation where the (incidental) memory usage of paged in memory was could (regularly) get higher than physical memory would allow, and thus have the (clear) workaround of having swap to 'cushion' the memory shortage issue by allowing a "second level" memory storage on disk.
Still, this does not making memory unlimited. Swap extends the physical memory available with the amount of swap. There still is a situation where you can run out of memory when swap is added, simply by paging in more memory than physical memory and swap.

Today, most systems are not memory constrained anymore, or: it is possible to get a server with enough physical memory to hold your common needed total memory need.
And given the latency sensitive nature of databases in general, which includes postgres, for any serious deployment you should get a server with enough memory to host your workload, and configure postgres not to overload the memory.

If you do oversubscribe on (physical) memory, you will get pain somewhere, there is no way around that.
The article in defense of swap in essence is saying that if you happen to oversubscribe on memory, sharing the pain between anonymous and file is better.
I would say you are already in a bad place if that happens, which is especially bad for databases, and databases should allow you to make memory usage predictable.

However, what I found is that with 4+ kernels (4.18 to be precise; rhel 8), the kernel can try to favour file pages in certain situations making anonymous memory getting paged out even if swappiness is set to 1 or 0, and if there is a wealth of inactive file memory. It seems to have to do with workingset protection(?) mechanisms, but given the lack of clear statistics I can't be sure about that. What it does lead to in my situations is a constant rate of swapping in and out in certain situations, whilst there is no technical reason for linux to do so because there is enough available memory.

My point of view has been that vm.overcommit_memory set to 2 was the way to go, because that allows linux to limit based on a set limit on allocation time, which guarantees way to make the database never run out of memory.
it does guarantees linux to never run out of memory, absolutely.
However, this limit is hard, and is applied for the process at both usermode and system mode (kernel level), and thus can enforce not providing memory at times where it's not safe to do so, and thus corrupt execution. I have to be honest, I have not seen this myself, but trustworthy sources have reported this repeatedly, which I am inclined to believe. This means postgres execution can corrupt/terminate in unlucky situations, which is impacts availability.


Frits Hoogland

> On 5 Aug 2025, at 20:52, Joe Conway <mail(at)joeconway(dot)com> wrote:
>
> On 8/5/25 13:01, Priya V wrote:
>> *Environment:*
>> *PostgreSQL Versions:* Mix of 13.13 and 15.12 (upgrades in progress
>> to be at 15.12 currently both are actively in use)
>
> PostgreSQL 13 end of life after November 13, 2025
>
>> *OS / Kernel:* RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range
>
> RHEL 7 has been EOL for quite a while now. Note that you have to watch out for collation issues/corrupted indexes after OS upgrades due to collations changing with newer glibc versions.
>
>> *Swap:* Currently none
>
> bad idea
>
>> *Workload:* Highly mixed — OLTP-style internal apps with
>> unpredictable query patterns and connection counts
>> *Goal:* Uniform, safe memory settings across the fleet to avoid
>> kernel or database instability
>
>> We’re considering:
>> *|vm.overcommit_memory = 2|* for strict accounting
>
> yes
>
>> Increasing |vm.overcommit_ratio| from 50 → 80 or 90 to better
>> reflect actual PostgreSQL usage (e.g., |work_mem| reservations that
>> aren’t fully used)
>
> work_mem does not reserve memory -- it is a maximum that might be used in memory for a particular operation
>
>> *Our questions for those running large PostgreSQL fleets:*
>> 1.
>> What |overcommit_ratio| do you find safe for PostgreSQL without
>> causing kernel memory crunches?
>
> Read this:
> https://www.cybertec-postgresql.com/en/what-you-should-know-about-linux-memory-overcommit-in-postgresql/
>
>> 2.
>> Do you prefer |overcommit_memory = 1| or |= 2| for production stability?
>
> Use overcommit_memory = 2 for production stability
>
>> 3.
>> How much swap (if any) do you keep in large-memory servers where
>> PostgreSQL is the primary workload? Is having swap configured a good
>> idea or not ?
>
> You don't necessary need a large amount of swap, but you definitely should not disable it.
>
> Some background on that:
> https://chrisdown.name/2018/01/02/in-defence-of-swap.html
>
>> 4.
>> Any real-world cases where kernel accounting was too strict or too
>> loose for PostgreSQL?
>
> In my experience the biggest issues are when postgres is running in a memory constrained cgroup. If you want to constrain memory with cgroups, use cgroup v2 (not 1) and use memory.high to constrain it, not memory.max.
>
>> 5. What settings to go with if we are not planning on using swap ?
>
> IMHO do not disable swap on Linux, at least not on production, ever.
>
>> We’d like to avoid both extremes:
>> Too low a ratio → PostgreSQL backends failing allocations even with
>> free RAM
>
> Have you actually seen this or are you theorizing?
>
>> Too high a ratio → OOM killer terminating PostgreSQL under load spikes
>
> If overcommit_memory = 2, overcommit_ratio is reasonable (less than 100, maybe 80 or so as you suggested), and swap is not disabled, and you are not running in a memory constrained cgroup, I would be very surprised if you will ever get hit by the OOM killer. And if you do, things are so bad the database was probably dying anyway.
>
> HTH,
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> Amazon Web Services: https://aws.amazon.com
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Priya V 2025-08-06 15:51:29 Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Previous Message Joe Conway 2025-08-05 18:52:25 Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet