Re: PANIC: could not flush dirty data: Cannot allocate memory

From: klaus(dot)mailinglists(at)pernau(dot)at
To: pgsql-general(at)postgresql(dot)org
Subject: Re: PANIC: could not flush dirty data: Cannot allocate memory
Date: 2022-11-29 10:45:03
Message-ID: 4eeb184a1f907c0deab774429602568b@pernau.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello all!

Thanks for the many hints to look for. We did some tuning and further
debugging and here are the outcomes, answering all questions in a single
email.

> In the meantime, you could experiment with setting
> checkpoint_flush_after to 0
We did this:
# SHOW checkpoint_flush_after;
checkpoint_flush_after
------------------------
0
(1 row)

But we STILL have PANICs. I tried to understand the code but failed. I
guess that there are some code paths which call pg_flush_data() without
checking this settings, or the check does not work.

> Did this start after upgrading to 22.04? Or after a certain kernel
> upgrade?

For sure it only started with Ubuntu 22.04. We did not had and still not
have any issues on servers with Ubuntu 20.04 and 18.04.

> I would believe that the kernel would raise
> a bunch of printks if it hit ENOMEM in the commonly used paths, so
> you would see something in dmesg or wherever you collect your kernel
> log if it happened where it was expected.

There is nothing in the kernel logs (dmesg)

> Do you use cgroups or such to limit memory usage of postgres?

No

> Any uncommon options on the filesystem or the mount point?
No. Also no Antivirus:
/dev/xvda2 / ext4 noatime,nodiratime,errors=remount-ro 0 1
or
LABEL=cloudimg-rootfs / ext4 discard,errors=remount-ro
0 1

> does this happen on all the hosts, or is it limited to one host or one
> technology?

It happens on XEN VMs, KVM VMs and VMware VMs. On Intel and AMD
plattforms.

> Another interesting thing would be to know the mount and file system
> options
> for the FS that triggers the failures. E.g.

# tune2fs -l /dev/sda1
tune2fs 1.46.5 (30-Dec-2021)
Filesystem volume name: cloudimg-rootfs
Last mounted on: /
Filesystem UUID: 0522e6b3-8d40-4754-a87e-5678a6921e37
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent 64bit flex_bg encrypt sparse_super
large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 12902400
Block count: 26185979
Reserved block count: 0
Overhead clusters: 35096
Free blocks: 18451033
Free inodes: 12789946
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 243
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16128
Inode blocks per group: 1008
Flex block group size: 16
Filesystem created: Wed Apr 20 18:31:24 2022
Last mount time: Thu Nov 10 09:49:34 2022
Last write time: Thu Nov 10 09:49:34 2022
Mount count: 7
Maximum mount count: -1
Last checked: Wed Apr 20 18:31:24 2022
Check interval: 0 (<none>)
Lifetime writes: 252 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
First orphan inode: 42571
Default directory hash: half_md4
Directory Hash Seed: c5ef129b-fbee-4f35-8f28-ad7cc93c1c43
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0xb74ebbc3

Thanks
Klaus

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message li jie 2022-11-29 12:21:07 Re: Support logical replication of DDLs
Previous Message Ajin Cherian 2022-11-29 06:39:29 Re: Support logical replication of DDLs