| From: | Raphaël Perissat <raphael(at)atmotrack(dot)fr> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18 |
| Date: | 2026-02-11 09:56:53 |
| Message-ID: | CAOLPA2ds60bcR7ZnL1uwdottW+2DS+Hx=3Gy28ex2E9-+UXQAg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi and thanks for the explanation.
Indeed the pmap seems to confirm that the private memory usage is correct
# pmap -d -p $(pgrep -f "postgres.*checkpointer") | tail -n 1
mapped: 8647256K writeable/private: 2632K shared: 8574584K
I used pg_log_backend_memory_contexts() with the pid of the checkpointer
process and I can't see much on this side as well :
2026-02-11 08:58:13.794 UTC [1988622] LOG: logging memory contexts of PID
1988622
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 1; TopMemoryContext:
61568 total in 3 blocks; 2768 free (0 chunks); 58800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; smgr relation table:
32768 total in 3 blocks; 16904 free (9 chunks); 15864 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Checkpointer: 24576
total in 2 blocks; 24296 free (13 chunks); 280 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; LOCALLOCK hash: 8192
total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; WAL record
construction: 50200 total in 2 blocks; 6400 free (0 chunks); 43800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; PrivateRefCount: 8192
total in 1 blocks; 2672 free (0 chunks); 5520 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; MdSmgr: 8192 total in
1 blocks; 7952 free (62 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Pending ops context:
8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; Pending Ops Table:
16384 total in 2 blocks; 6712 free (3 chunks); 9672 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Rendezvous variable
hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; GUCMemoryContext:
32768 total in 3 blocks; 3264 free (19 chunks); 29504 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; GUC hash table: 32768
total in 3 blocks; 10664 free (6 chunks); 22104 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; Timezones: 104112
total in 2 blocks; 2672 free (0 chunks); 101440 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; ErrorContext: 8192
total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: Grand total: 404296 bytes in 26
blocks; 101440 free (122 chunks); 302856 used
I monitor my servers metric on ELK (grafana-like) and I can clearly see the
memory usage growing with approx 100MB/hour until it reaches 8GB for the
postgres process, causing patroni to crashout on the primary.
The fact that this memory "leak" is appearing on both the primary and the 2
replicas make me think that this is not caused by some ingest delay / index
creation but maybe I'm wrong.
I restarted the 3 nodes yesterday and here is the output of the watch
command on the primary :
total used free shared buff/cache available
Mem: 31988 5861 670 4571 30530
26126
Swap: 4095 55 4040
free is only showing 670 already, and I can see it going down in real-time
like a countdown.
Based on this output on the replicas :
:~# ps aux --sort=-%mem | head -20
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 62338 0.1 10.6 8652016 3492940 ? Ss Feb10 1:04
postgres: atmo_data: checkpointer
postgres 62340 0.2 9.8 8655736 3227356 ? Ss Feb10 2:36
postgres: atmo_data: startup recovering 0000001E00000246000000C9
Couldn't the startup recovery process be causing this ? I do this command
on the same replica time to time and I can clearly see that the %MEM used
by those 2 processes are growing over time.
Thanks for your help.
Le mar. 10 févr. 2026 à 20:58, Andres Freund <andres(at)anarazel(dot)de> a écrit :
> Hi,
>
> On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:
> > I recently migrated my cluster with 3 dedicated servers to a new
> cluster. I
> > was running on PG12 and I am now on PG18.1.
> > I noticed an increasing memory usage on all of my 3 node, until at some
> > point there is no memory left and patroni crashes on the leader, leaving
> the
> > cluster with no available primary.
> > The cluster is a Data Warehouse type using TimescaleDB, ingesting
> approx. 1M
> > of time-serie a day.
> > It appears that the memory leak is affecting both the checkpointer and
> > startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
> > I never had such issue on the old cluster with PG12 and the server's
> > configuration and cluster usage are the same (except the upgrade of PG)
> >
> > SYMPTOMS:
> > - Checkpointer process grows to 5.6GB RSS after 24 hours
> > - Startup process on replicas grows to 3.9GB RSS
> > - Memory growth rate: approximately 160-200MB per hour
> > - Eventually causes out-of-memory conditions
> >
> > CONFIGURATION:
> > - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue
> persists
> > - Platform: Debian 13
> > - TimescaleDB: 2.23.0
> > - Deployment: 3-node Patroni cluster with streaming replication
> > - WAL level: logical
> > - Hot standby enabled
> >
> > SYSTEM RESOURCES:
> > RAM: 32GB
> > Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz
> >
> > KEY SETTINGS:
> > - wal_level: logical
> > - hot_standby: on
> > - max_wal_senders: 20
> > - max_replication_slots: 20
> > - wal_keep_size: 1GB
> > - shared_buffer: 8GB
> >
> > WAL STATISTICS (over 7 days):
> > - Total WAL generated: 2.3TB (approximately 31GB/day)
> > - Replication lag: 0 bytes (replicas are caught up)
> > - No long-running transactions
> >
> > MEMORY STATE AFTER 24 HOURS:
> > On primary:
> > postgres checkpointer: 3.9GB RSS
> >
> > On replicas:
> > postgres checkpointer: 5.6GB RSS
> > postgres startup recovering: 3.9GB RSS <-- This is abnormal
>
> The RSS slowly increasing towards shared_buffers is normal if you're not
> using
> huge_pages. The OS only counts pages in shared memory as part of RSS once a
> page has been used in the process. Over time the checkpointer process
> touches
> more and more of shared_buffers, thus increasing the RSS.
>
> You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
> actually shared memory.
>
> To show this, here's a PS for a new backend:
>
> ps:
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> andres 2544694 0.0 0.0 8719956 25744 ? Ss 14:55 0:00
> postgres: dev assert: andres postgres [local] idle
>
> and then after reading in a relation 1.3GB relation:
>
> andres 2544694 1.7 2.2 8720972 1403576 ? Ss 14:55 0:00
> postgres: dev assert: andres postgres [local] idle
>
> So you can see that RSS increased proportionally with the amount of touched
> data.
>
> Whereas with pmap:
>
> pmap -d -p 2544694|tail -n 1
> mapped: 8721924K writeable/private: 5196K shared: 8646284K
>
>
> I think you would need to monitor the real memory usage of various
> processes
> to know why you're OOMing.
>
> You can use pg_log_backend_memory_contexts() to get the memory usage
> information of backend processes.
>
> Greetings,
>
> Andres Freund
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | PG Bug reporting form | 2026-02-11 15:57:15 | BUG #19401: Inconsistent predicate evaluation with derived table vs direct query involving NULL |
| Previous Message | Tom Lane | 2026-02-10 22:34:37 | Re: Possibly a bug |