Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
Date: 2016-08-22 22:19:35
Message-ID: CAEepm=2jDPLE+51VfKrjy+iNpY_xsHs6=W=bh7Sc7S5uuO4OBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 16, 2016 at 7:41 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> I still think it's worth thinking about something along these lines on
>> Linux only, where holey Swiss tmpfs files can bite you. Otherwise
>> disabling overcommit on your OS isn't enough to prevent something
>> which is really a kind of deferred overcommit with a surprising
>> failure mode (SIGBUS rather than OOM SIGKILL).
>
> Yeah, I am inclined to agree. I mean, creating a DSM is fairly
> heavyweight already, so one extra system call isn't (I hope) a crazy
> overhead. We could test to see how much it slows things down. But it
> may be worth paying the cost even if it ends up being kinda expensive.
> We don't really have any way of knowing whether the caller's request
> is reasonable relative to the amount of virtual memory available, and
> converting a possible SIGBUS into an ereport(ERROR, ...) is a big win.

Here's a version of the patch that only does something special if the
following planets are aligned:

* Linux only: for now, there doesn't seem to be any reason to assume
that other operating systems share this file-with-holes implementation
quirk, or that posix_fallocate would work on such a fd, or which errno
values to tolerate if it doesn't. From what I can tell, Solaris,
FreeBSD etc either don't overcommit or do normal non-stealth
overcommit with the usual out-of-swap failure mode for shm_open
memory, with a way to turn overcommit off. So I put a preprocessor
test in to do this just for __linux__, and I used "fallocate" (a
non-standard Linux syscall) instead of "posix_fallocate".

* Glibc version >= 2.10: ancient versions and other libc
implementations don't have fallocate, so I put a test into the
configure script.

* Kernel version >= 2.6.23+: the man page says that ancient kernels
don't provide the syscall, and that glibc sets errno to ENOSYS in that
case, so I put a check in to keep calm and carry on.

I don't know if any distros ever shipped with an old enough kernel and
new enough glibc for ENOSYS to happen in the wild; for example RHEL5
had neither kernel nor glibc support, and RHEL6 had both. I haven't
personally tested that path.

Maybe it would be worth thinking about whether this is a condition
that should cause dsm_create to return NULL rather than ereporting,
depending on a flag along the lines of the existing
DSM_CREATE_NULL_IF_MAXSEGMENTS. But that could be a separate patch if
it turns out to be useful.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
fallocate.patch application/octet-stream 3.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gabriele Bartolini 2016-08-22 22:34:58 pg_receivexlog does not report flush position with --synchronous
Previous Message Gavin Flower 2016-08-22 22:02:43 Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)