Re: DB running out of memory issues after upgrade

From: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: DB running out of memory issues after upgrade
Date: 2020-02-23 10:19:28
Message-ID: 20200223101928.GE12464@hjp.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-hackers pgadmin-support pgsql-admin pgsql-bugs pgsql-general pgsql-hackers pgsql-performance pgsql-pkg-yum

On 2020-02-18 18:10:08 +0000, Nagaraj Raj wrote:
> Below are the same configurations ins .conf file before and after updagrade
>
> show max_connections; = 1743
[...]
> show work_mem = "4MB"

This is an interesting combination: So you expect a large number of
connections but each one should use very little RAM?

[...]

> here is some sys logs,
>
> 2020-02-16 21:01:17 UTC [-]The database process was killed by the OS
> due to excessive memory consumption.
> 2020-02-16 13:41:16 UTC [-]The database process was killed by the OS
> due to excessive memory consumption.

The oom-killer produces a huge block of messages which you can find with
dmesg or in your syslog. It looks something like this:

Feb 19 19:06:53 akran kernel: [3026711.344817] platzangst invoked oom-killer: gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, oom_score_adj=0
Feb 19 19:06:53 akran kernel: [3026711.344819] platzangst cpuset=/ mems_allowed=0-1
Feb 19 19:06:53 akran kernel: [3026711.344825] CPU: 7 PID: 2012 Comm: platzangst Tainted: G OE 4.15.0-74-generic #84-Ubuntu
Feb 19 19:06:53 akran kernel: [3026711.344826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.7 06/16/2016
Feb 19 19:06:53 akran kernel: [3026711.344827] Call Trace:
Feb 19 19:06:53 akran kernel: [3026711.344835] dump_stack+0x6d/0x8e
Feb 19 19:06:53 akran kernel: [3026711.344839] dump_header+0x71/0x285
...
Feb 19 19:06:53 akran kernel: [3026711.344893] RIP: 0033:0x7f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344894] RSP: 002b:00007fff187ef240 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Feb 19 19:06:53 akran kernel: [3026711.344895] RAX: ffffffffffffffda RBX: 00007fff187ef240 RCX: 00007f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344896] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Feb 19 19:06:53 akran kernel: [3026711.344897] RBP: 00007fff187ef2b0 R08: 00007f292d596740 R09: 00000000009d43a0
Feb 19 19:06:53 akran kernel: [3026711.344897] R10: 00007f292d596a10 R11: 0000000000000246 R12: 0000000000000000
Feb 19 19:06:53 akran kernel: [3026711.344898] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
Feb 19 19:06:53 akran kernel: [3026711.344899] Mem-Info:
Feb 19 19:06:53 akran kernel: [3026711.344905] active_anon:14862589 inactive_anon:1133875 isolated_anon:0
Feb 19 19:06:53 akran kernel: [3026711.344905] active_file:467 inactive_file:371 isolated_file:0
Feb 19 19:06:53 akran kernel: [3026711.344905] unevictable:0 dirty:3 writeback:0 unstable:0
...
Feb 19 19:06:53 akran kernel: [3026711.344985] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 19 19:06:53 akran kernel: [3026711.344997] [ 823] 0 823 44909 0 106496 121 0 lvmetad
Feb 19 19:06:53 akran kernel: [3026711.344999] [ 1354] 0 1354 11901 3 135168 112 0 rpcbind
Feb 19 19:06:53 akran kernel: [3026711.345000] [ 1485] 0 1485 69911 99 180224 159 0 accounts-daemon
...
Feb 19 19:06:53 akran kernel: [3026711.345345] Out of memory: Kill process 25591 (postgres) score 697 or sacrifice child
Feb 19 19:06:53 akran kernel: [3026711.346563] Killed process 25591 (postgres) total-vm:71116948kB, anon-rss:52727552kB, file-rss:0kB, shmem-rss:3023196kB

The most interesting lines are usually the last two: In this case they
tell us that the process killed was a postgres process and it occupied
about 71 GB of virtual memory at that time. That was clearly the right
choice since the machine has only 64 GB of RAM. Sometimes it is less
clear and then you might want to scroll through the (usually long) list
of processes to see if there are other processes which need suspicious
amounts of RAM or maybe if there are just more of them than you would
expect.

> I identified one simple select which consuming more memory and here is the
> query plan,
>
>
>
> "Result (cost=0.00..94891854.11 rows=3160784900 width=288)"
> " -> Append (cost=0.00..47480080.61 rows=3160784900 width=288)"
> " -> Seq Scan on msghist (cost=0.00..15682777.12 rows=3129490000 width
> =288)"
> " Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"
> " -> Seq Scan on msghist msghist_1 (cost=0.00..189454.50 rows=31294900
> width=288)"
> " Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"

So: How much memory does that use? It produces a huge number of rows
(more than 3 billion) but it doesn't do much with them, so I wouldn't
expect the postgres process itself to use much memory. Are you sure its
the postgres process and not the application which uses a lot of memory?

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp(at)hjp(dot)at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

In response to

Browse pgadmin-hackers by date

  From Date Subject
Next Message Akshay Joshi 2020-02-24 06:41:53 pgAdmin 4 commit: Fix an issue where command and statements were parsed
Previous Message Aditya Toshniwal 2020-02-21 09:39:25 [pgAdmin][RM5115] RULE SQL incorrectly generated

Browse pgadmin-support by date

  From Date Subject
Next Message Dave Page 2020-02-24 09:23:43 Re: Privacy question regarding the "Geometry Viewer" feature in pgAdmin 4.
Previous Message tutiluren 2020-02-23 06:02:40 Privacy question regarding the "Geometry Viewer" feature in pgAdmin 4.

Browse pgsql-admin by date

  From Date Subject
Next Message Adrian Klaver 2020-02-23 18:19:34 Re: Reset DB stats suggestion pg_stat_reset()
Previous Message postggen2020 s 2020-02-23 02:40:33 Re: Reset DB stats suggestion pg_stat_reset()

Browse pgsql-bugs by date

  From Date Subject
Next Message Fan Liu 2020-02-24 01:32:26 RE: [Bus error] huge_pages default value (try) not fall back
Previous Message Tom Lane 2020-02-23 06:43:26 Re: BUG #16273: InitDB Memory leak

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2020-02-23 11:21:48 Re: Can I trigger an action from a coalesce ?
Previous Message Andrus 2020-02-23 09:06:09 Re: How to get error message details from libpq based psqlODBC driver (regression)

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2020-02-23 11:16:23 Re: Error on failed COMMIT
Previous Message Michael Paquier 2020-02-23 07:27:57 Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2020-02-23 10:46:09 Re: PostgreSQL 11 higher Planning time on Partitioned table
Previous Message Justin Pryzby 2020-02-23 10:12:09 Re: PostgreSQL 11 higher Planning time on Partitioned table

Browse pgsql-pkg-yum by date

  From Date Subject
Next Message Merlin Moncure 2020-02-24 16:34:03 Re: DB running out of memory issues after upgrade
Previous Message Nagaraj Raj 2020-02-18 19:10:20 Re: DB running out of memory issues after upgrade