systemd deletes shared memory segment in /dev/shm/Postgresql.NNNNNN

From: mgbii bax <gezeala(at)gmail(dot)com>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: systemd deletes shared memory segment in /dev/shm/Postgresql.NNNNNN
Date: 2016-01-22 00:07:50
Message-ID: CAJKO3mV+k5d0Cg4RYwovmEfMphieT57X4KqZv-RhWxzEpu1fJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

We were hit by some interesting addition to systemd, it appears that
logging in/out to the machine with the user account used to start the
postgres service has some catastrophic effect. A systemd process deleted
Postgresql.NNNN file in /dev/shm (tmpfs).

errors:

Jan 21 10:30:01 stg1 systemd: Started Session 3396 of user admin.
>
> Jan 21 10:30:01 stg1 systemd: Starting Session 3396 of user admin.
>
> Jan 21 10:30:01 stg1 postgres[31239]: [3-1] FATAL: semctl(13139971, 11,
>> SETVAL, 0) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [3-1] LOG: server process (PID
>> 31239) exited with exit code 1
>
> Jan 21 10:30:01 stg1 postgres[28042]: [4-1] LOG: terminating any other
>> active server processes
>
> Jan 21 10:30:01 stg1 postgres[28047]: [3-1] WARNING: terminating
>> connection because of crash of another server process
>
> Jan 21 10:30:01 stg1 postgres[28047]: [3-2] DETAIL: The postmaster has
>> commanded this server process to roll back the current transaction and
>> exit, because another server process exited abnormally and possibly
>> corrupted shared memory.
>
> Jan 21 10:30:01 stg1 postgres[28047]: [3-3] HINT: In a moment you should
>> be able to reconnect to the database and repeat your command.
>
> Jan 21 10:30:01 stg1 postgres[28042]: [5-1] LOG: all server processes
>> terminated; reinitializing
>
> Jan 21 10:30:01 stg1 postgres[28042]: [6-1] LOG: could not remove shared
>> memory segment "/PostgreSQL.1804289383": No such file or directory
>
> Jan 21 10:30:01 stg1 postgres[28042]: [7-1] LOG: semctl(13041664, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [8-1] LOG: semctl(13074433, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [9-1] LOG: semctl(13107202, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [10-1] LOG: semctl(13139971, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [11-1] LOG: semctl(13172740, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[31260]: [12-1] LOG: database system was
>> interrupted; last known up at 2016-01-21 10:23:17 PST
>
> Jan 21 10:30:01 stg1 postgres[31260]: [13-1] LOG: database system was not
>> properly shut down; automatic recovery in progress
>
> Jan 21 10:30:01 stg1 postgres[31260]: [14-1] LOG: record with zero length
>> at 130/66154E90
>
> Jan 21 10:30:01 stg1 postgres[31260]: [15-1] LOG: redo is not required
>
> Jan 21 10:30:01 stg1 postgres[31260]: [16-1] LOG: MultiXact member
>> wraparound protections are now enabled
>
> Jan 21 10:30:01 stg1 postgres[28042]: [12-1] LOG: database system is
>> ready to accept connections
>
> Jan 21 10:30:01 stg1 postgres[31267]: [12-1] LOG: autovacuum launcher
>> started
>
> Jan 21 10:30:26 stg1 systemd: Removed slice user-1001.slice.
>
> Jan 21 10:30:26 stg1 systemd: Stopping user-1001.slice.
>
> Jan 21 10:30:35 stg1 systemd: Created slice user-1001.slice.
>
> Jan 21 10:30:35 stg1 systemd: Starting user-1001.slice.
>
> Jan 21 10:30:35 stg1 systemd-logind: New session 3397 of user admin.
>
>
$ psql postgres

> psql: FATAL: semctl(11337731, 11, SETVAL, 0) failed: Invalid argument
>
>
log shows pg crashes and restarts..

$ psql postgres

> psql (9.4.5)
>>
> Type "help" for help.
>
>
>> postgres=#
>
>
Postgresql file in /dev/shm (tmpfs) appears to be removed by some systemd
process:

$ ls -lt /dev/shm/

> total 84
>
> -rw------- 1 admin admin 3916 Jan 21 09:05 PostgreSQL.1804289383 ==>
>> deleted causing the errors above
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-3708236591
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-4055075926
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-3910933030
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-979612067
>
>
>
OS:
$ cat /etc/centos-release

> CentOS Linux release 7.1.1503 (Core)
>

Postgres version:

> postgres=# select version();

-[ RECORD 1
> ]---------------------------------------------------------------------------------------------------------

version | PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc
> (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

$ cat /etc/systemd/logind.conf

>
> # This file is part of systemd.

#

# systemd is free software; you can redistribute it and/or modify it

# under the terms of the GNU Lesser General Public License as published by

# the Free Software Foundation; either version 2.1 of the License, or

# (at your option) any later version.

#

# Entries in this file show the compile time defaults.

# You can change settings by editing this file.

# Defaults can be restored by simply deleting this file.

#

# See logind.conf(5) for details.

> [Login]

#NAutoVTs=6

#ReserveVT=6

#KillUserProcesses=no

#KillOnlyUsers=

#KillExcludeUsers=root

#InhibitDelayMaxSec=5

#HandlePowerKey=poweroff

#HandleSuspendKey=suspend

#HandleHibernateKey=hibernate

#HandleLidSwitch=suspend

#HandleLidSwitchDocked=ignore

#PowerKeyIgnoreInhibited=no

#SuspendKeyIgnoreInhibited=no

#HibernateKeyIgnoreInhibited=no

#LidSwitchIgnoreInhibited=yes

#IdleAction=ignore

#IdleActionSec=30min

#RuntimeDirectorySize=10% =>> new entry

#RemoveIPC=yes =>> new entry

Culprit could be a recent install which updated systemd to 219:
Jan 19 13:29:23 Updated: systemd-libs-219-19.el7.x86_64
Jan 19 13:29:28 Updated: systemd-219-19.el7.x86_64
Jan 19 13:29:39 Updated: systemd-sysv-219-19.el7.x86_64
Jan 19 13:29:40 Updated: systemd-python-219-19.el7.x86_64

Anybody on the list having the same issue? As a workaround, we have set the
2 new entries in logind.conf from:

> #RuntimeDirectorySize=10%
>
> #RemoveIPC=yes
>
>
to

> RuntimeDirectorySize=1%

RemoveIPC=no
>

RuntimeDirectorySize to 1% (optional), when a user ssh/logins to the server
a new tmpfs mount is created using 10% of the RAM size (machine has 512GB)
- looks like a new change that came with systemd updates too.

before mods:
$ mount | grep tmpfs

> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)
>
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)

tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)

tmpfs on /run/user/42 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=42,gid=42) ==> gdm

tmpfs on /run/user/0 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700) ==> root

tmpfs on /run/user/1001 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=1001,gid=1001) ==>
> some user ~51G tmpfs (new feature?)

tmpfs on /run/user/6301 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=6301,gid=10000) ==>
> some user

before mods:
$ df -h | grep tmpfs

> devtmpfs 252G 0 252G 0% /dev
>
tmpfs 252G 84K 252G 1% /dev/shm

tmpfs 252G 492M 252G 1% /run

tmpfs 252G 0 252G 0%
> /sys/fs/cgroup

tmpfs 51G 0 51G 0% /run/user/42

tmpfs 51G 0 51G 0% /run/user/0
>

after mods:
$ mount | grep tmpfs

> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)

tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)

tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)

tmpfs on /run/user/42 type tmpfs
> (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=42,gid=42)

tmpfs on /run/user/0 type tmpfs
> (rw,nosuid,nodev,relatime,size=5280284k,mode=700)

tmpfs on /run/user/1001 type tmpfs
> (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=1001,gid=1001)

after mods:
$ df -h | grep tmpfs

> devtmpfs 252G 0 252G 0% /dev

tmpfs 252G 88K 252G 1% /dev/shm

tmpfs 252G 19M 252G 1% /run

tmpfs 252G 0 252G 0% /sys/fs/cgroup

tmpfs 5.1G 12K 5.1G 1% /run/user/42

tmpfs 5.1G 0 5.1G 0% /run/user/0

RemoveIPC to no - disabling works - /dev/shm/Postgres.NNNN file seemed to
be intact.

This is the forum post I found that can be linked to this:
http://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html

--

regards

marie gezeala bacuño II

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ankur Kaushik 2016-01-22 03:47:03 Application hangs
Previous Message girish R G peetle 2016-01-21 16:11:03 Re: PostgreSQL Stand By Database Server backup (without using pg_basebackup)