Re: PostgreSQL 14.4 ERROR: out of memory issues

From: Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: PostgreSQL 14.4 ERROR: out of memory issues
Date: 2022-07-20 22:26:54
Message-ID: CAODqTUZYWnVWF41ELYxFY_8tzo0165LxFu7t52gRBwi-apx51A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

po 18. 7. 2022 v 21:04 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:

> =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?= <zeleny(dot)ales(at)gmail(dot)com> writes:
> > postgres=# show shared_buffers ;
> > shared_buffers
> > ----------------
> > 10GB
> > (1 row)
>
> Oh! In that case, there is absolutely nothing to see here.
> This chunk:
>
> > 00007fd9b0551000 10827040K rw-s- zero (deleted)
>
> must be your shared-memory region, and it's accounting for just
> about all of the process's memory space. There is no reason
> to suspect a leak.
>

I agree this 10 GB is expected to be accounted for shared buffers (just a
size correlation) and therefore no memory leak here.

> I think you've been fooled by the fact that "top" and some other
> tools frequently don't report a shared-memory page as part of a
> process's usage until that process touches that particular page.
> Thus, the reported memory usage of a Postgres backend will grow
> over time as it randomly happens to touch different buffers within
> the shared buffer arena. That does not constitute a leak, but
> it does make such tools next door to useless for detecting
> actual leaks :-(. You can only believe there's a leak if the
> reported usage doesn't level off after reaching the vicinity of
> your shared memory size.
>

On the other hand, if we do have 10GB of shared buffers plus ~14MB reported
by Top Memory Context (even if /proc/<pid>/status reports RssAnon memory
when touched, the process VM peak is ~12 GB?
-bash-4.2$ cat 20220718_200230.status.17048.log
Name: postmaster
Umask: 0077
State: S (sleeping)
Tgid: 17048
Ngid: 0
Pid: 17048
PPid: 4476
TracerPid: 0
Uid: 26 26 26 26
Gid: 26 26 26 26
FDSize: 1024
Groups: 26
VmPeak: 12477752 kB
VmSize: 12439740 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 9811560 kB
VmRSS: 9811560 kB
RssAnon: 1283964 kB
RssFile: 15784 kB
RssShmem: 8511812 kB
VmData: 1282860 kB
VmStk: 132 kB
VmExe: 7816 kB
VmLib: 85312 kB
VmPTE: 23716 kB
VmSwap: 0 kB
Threads: 1
SigQ: 0/160197
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000400000
SigIgn: 0000000001701800
SigCgt: 00000001a0006287
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ffffff
Cpus_allowed_list: 0-23
Mems_allowed:
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 5273344
nonvoluntary_ctxt_switches: 13651

I'd expect VMPeak size close to 10GB (suppose that all shared memory pages
were touched) shared_buffers plus ~14MB of private memory, but
VMPeak: 12477752 [kB] - 10485760 shared buffers [kB] = 1991992 kB Which is
a bit more than reported RssAnon: 1283964 kB memory for the process, but
very close.

pmap taken in the same time:

-bash-4.2$ cat 20220718_200230.pmap.17048.log | grep -e anon -e deleted
0000000000db3000 200K rw--- [ anon ]
0000000002d15000 504K rw--- [ anon ]
0000000002d93000 934476K rw--- [ anon ] <<<= this is suspected to be
responsible for the OutOfMemory -> failing fork(), reinitialize the cluser
00007fd989776000 311304K rw--- [ anon ] <<<= togehter with this one
00007fd9a8d75000 32772K rw--- [ anon ]
00007fd9acb65000 20K rw--- [ anon ]
00007fd9affc1000 372K rw--- [ anon ]
00007fd9b023d000 1028K rw--- [ anon ]
00007fd9b054b000 24K rw--- [ anon ]
00007fd9b0551000 10827040K rw-s- zero (deleted) <<<= expected to be
shared_buffers
00007fdc45f18000 16K rw--- [ anon ]
00007fdc4d50c000 184K rw--- [ anon ]
00007fdc4de16000 4K rw--- [ anon ]
00007fdc4e03c000 8K rw--- [ anon ]
00007fdc4fd1e000 84K rw--- [ anon ]
00007fdc4ff6f000 8K rw--- [ anon ]
00007fdc508d7000 8K rw--- [ anon ]
00007fdc50d59000 4K rw--- [ anon ]
00007fdc5139e000 8K rw--- [ anon ]
00007fdc520f3000 40K rw--- [ anon ]
00007fdc52902000 20K rw--- [ anon ]
00007fdc52ead000 16K rw--- [ anon ]
00007fdc532af000 4K rw--- [ anon ]
00007fdc53e2b000 16K rw--- [ anon ]
00007fdc544db000 16K rw--- [ anon ]
00007fdc54cc8000 8K rw--- [ anon ]
00007fdc54f1a000 1588K rw--- [ anon ]
00007fdc550dc000 104K rw--- [ anon ]
00007fdc550fc000 8K r-x-- [ anon ]
00007fdc550ff000 4K rw--- [ anon ]
00007fdc55102000 4K rw--- [ anon ]
00007fffd38f0000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]

The machine has 40GB of RAM. It is a dedicated DB server, only a single PG
instance is running there. Since we again run out of memory from my last
reply, now it looks healthy:

-bash-4.2$ psql -c "select count(*) from pg_stat_activity"
count
-------
336
(1 row)

$-bash-4.2$ sudo smem -u
User Count Swap USS PSS RSS
chrony 1 0 512 515 872
dbus 1 0 1064 1074 1432
postfix 2 0 2752 3122 5540
polkitd 1 0 9152 9276 9816
postgres-exporter 1 0 11912 11912 11916
zelenya 9 0 34320 35550 43648
sql_exporter 1 0 46332 46332 46340
root 28 0 267920 274273 298388
postgres 336 0 5904884 16399946 143527180

-bash-4.2$ free -h
total used free shared buff/cache
available
Mem: 39G 7,3G 1,2G 10G 30G
21G
Swap: 0B 0B 0B

The processes suspect to be responsible for memory exhaustion (using the
RssAnon memory from /proc/<pid>/status files always belong to a single user:

-bash-4.2$ psql -c "select count(*) from pg_stat_activity where
usename='app_feed'"
count
-------
67
(1 row)

So the available 21GB (under normal conditions used by the kernel as file
system page cache) is not enough if 20 of the processes allocate 1GB
private memory -> OS CommitLimit is reached and OutOfMemory error happens.

So back to why you're getting these out-of-memory failures:
> we still don't have much info about that. I wonder whether
> there is something watching the total usage reported by "top"
> and taking action based on that entirely-misleading number.
> The kernel itself should be aware that there's no extra memory
> pressure from N backends all using the same shared memory
> segment, but it can be hard for outside tools to know that.
>

There is no tool taking action when high memory usage is detected, OS
kernel provides all the memory requested by processes until it happens,
that a fork() fails (usually when autovacuum launcher attempts to spawn a
new autovacuum worker as these are most frequently started new processes.
But the failure comes from the failed fork() due to a lack of memory for
new process creation.

2022-07-01 06:51:02 CEST [4476]: [26-1] user=,db=,host=,app= LOG: could
not fork autovacuum worker process: Cannot allocate memory
2022-07-01 06:51:03 CEST [4476]: [27-1] user=,db=,host=,app= LOG: could
not fork worker process: Cannot allocate memory
2022-07-01 06:51:03 CEST [4476]: [28-1] user=,db=,host=,app= LOG: could
not fork worker process: Cannot allocate memory
2022-07-01 06:51:03 CEST [18882]: [1332-1]
user=app_feed,db=prematch,host=172.25.0.231,app=[unknown] ERROR: could not
resize shared memory segment "/PostgreSQL.1824635724" to 204064 bytes: No
space left on device

So OS kernel was short of available memory.

Memory Commit limit on the system:
-bash-4.2$ sysctl -q vm.overcommit_memory
vm.overcommit_memory = 2
-bash-4.2$ sysctl -q vm.overcommit_ratio
vm.overcommit_ratio = 95

swap is disabled:
-bash-4.2$ swapon -s
-bash-4.2$

CommitLimit = RAM * overcommit_ratio / 100 + swap ~ 37GB (ignoring
total_huge_TLB as huge pages are not used) as can be seen in meminfo:

-bash-4.2$ cat /proc/meminfo
MemTotal: 41033756 kB
MemFree: 1549608 kB
MemAvailable: 24782516 kB
Buffers: 8468 kB
Cached: 33667252 kB
SwapCached: 0 kB
Active: 24600744 kB
Inactive: 12524428 kB
Active(anon): 14131536 kB
Inactive(anon): 130116 kB
Active(file): 10469208 kB
Inactive(file): 12394312 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 1876 kB
Writeback: 16 kB
AnonPages: 3449376 kB
Mapped: 10796800 kB
Shmem: 10811896 kB
Slab: 987832 kB
SReclaimable: 813916 kB
SUnreclaim: 173916 kB
KernelStack: 13648 kB
PageTables: 823236 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 38982068 kB
Committed_AS: 19261792 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 256724 kB
VmallocChunk: 34338043900 kB
Percpu: 5632 kB
HardwareCorrupted: 0 kB
AnonHugePages: 65536 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 110464 kB
DirectMap2M: 6180864 kB
DirectMap1G: 37748736 kB

From the node-exporter metric, I can see that the component that grows over
time until the commit limit is reached and the cluster is reinitialized is
"Active(anon)". Diving into /proc/<pid>/status I can see multiple processes
of the app_feed DB user RssAnon memory growth over time and the sum of the
increments matches the "Active(anon)" growth seen in the global view in
/proc/meminfo.

This observed growth together with the anonymous memory segments (on top of
shared buffers mentioned above) was the reason why after the upgrade I
suspect an issue in PG14.4.

> At this point I suspect that PG 14.3 vs 14.4 is a red herring,
> and what you should be looking into is what else you updated
> at about the same time, particularly in the line of container
> management tools or the like (if you use any).
>

As mentioned, the machine is a dedicated DB server serving a single PG
instance. No containers are in place, no cgroup setup (neither memory, CPU
limits), clean CentOS7 + epel-release + pg packages (node_exporter and
postgres_exporter for monitoring).

The only change on that machine was an upgrade from PG14.3 to PG14.4 (using
PGDG RPM repo packages) and from that moment, the memory usage and
out-of-memory issues started, the first time less than 24 hours after the
upgrade.

I'm far from being sure it is a Postgres memory leak, my primary intention
is to provide as much data as I could to allow helping me identify this
issue's root cause. Wen, I'll be back from my vacation (2 weeks) I'd like
to downgrade to 14.3 (and we have to resist using index concurrently
operations) to see whether such memory allocations disappear or not.

Any hints or tips on what to do to analyze it better (what data from OS we
have to collect over time... (ideally avoiding the downgrade exercise) are
welcome.

Kind regards Ales Zeleny

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Menon, Deepak (Deepak) 2022-07-21 04:05:30 RE: [External]Re: Patroni & PostgreSQL issue
Previous Message Marc Millas 2022-07-20 21:33:33 Re: postgis