Re: [Bus error] huge_pages default value (try) not fall back

From: Odin Ugedal <odin(at)ugedal(dot)com>
To: Fan Liu <fan(dot)liu(at)ericsson(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: [Bus error] huge_pages default value (try) not fall back
Date: 2020-06-09 15:22:58
Message-ID: CAFpoUr1ggmGs8qpoKvYxNBO3h-T-n+MNh+JnLRYsYhHurVOiGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I stumbled upon this issue when working with the related issue in
Kubernetes that was referenced a few mails behind. So from what I
understand, it looks like this issue is/may be a result of how hugetlb
cgroup is enforcing the "limit_in_bytes" limit for huge pages. A
process should theoretically don't segfault like this under normal
circumstances when using memory received from a successful mmap. The
value set to "limit_in_bytes" is only enforced during page allocation,
and _not_ when mapping pages using mmap. This results in a successful
mmap for -n- huge pages as long as the system has -n- free hugepages,
even though the size is bigger than "limit_in_bytes". The process then
reserves the huge page memory, and makes it inaccessible to other
processes.

The real issue is when postgres tries to write to the memory it
received from mmap, and the kernel tries to allocate the reserved huge
page memory. Since it is not allowed to do so by the cgroup, the
process segfaults.

This issue has been fixed in Linux this patch
https://lkml.org/lkml/2020/2/3/1153, that adds a new element of
control to the cgroup that will fix this issue. There are however no
container runtimes that use it yet, and only 5.7+ (afaik.) kernels
support it, but the progress can be tracked here:
https://github.com/opencontainers/runtime-spec/issues/1050. The fix
for the upstream Kubernetes issue
(https://github.com/opencontainers/runtime-spec/issues/1050) that made
kubernetes set wrong value to the top level "limit_in_bytes" when the
pre-allocated page count increased after kubernetes (kubelet) startup,
will hopefully land in Kubernetes 1.19 (or 1.20). Fingers crossed!

Hopefully this makes some sense, and gives some insights into the issue...

Best regards,
Odin Ugedal

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Mukesh Chhatani 2020-06-09 19:00:06 Postmaster Crashing - Postgres 11 when JIT is enabled
Previous Message David G. Johnston 2020-06-09 15:10:22 Re: BUG #16481: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events