Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
Date: 2017-06-29 00:24:23
Message-ID: CAEepm=00bvrcNGaHWv7=9F_SE9YKfW1_DuFbAjFh41ycg8mLxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 29, 2017 at 11:04 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> diff --git a/configure.in b/configure.in
>> index 11eb9c8acfc..47452bbac43 100644
>> --- a/configure.in
>> +++ b/configure.in
>> @@ -1429,7 +1429,7 @@ PGAC_FUNC_WCSTOMBS_L
>> LIBS_including_readline="$LIBS"
>> LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
>>
>> -AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
>> +AC_CHECK_FUNCS([cbrt clock_gettime dlopen fallocate fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
>
> Why are we going for fallocate rather than posix_fallocate()? There's
> plenty use cases that can only be done with the former, but this doesn't
> seem like one of them?
>
> Currently this is a linux specific path - but I don't actually see any
> reason to keep it that way? It seems far from unlikely that this is just
> an issue with linux...

POSIX says that posix_fallocate() only works on files. It's an
implementation detail that shm_open() returns an fd for a tmpfs file
on Linux. posix_fallocate() on an fd returned by shm_open() fails
under FreeBSD, and I suspect on other Unices too (but haven't tried),
because there is no requirement for POSIX shm_open() objects to be so
file-ish. My take is that this is a Linux-specific problem requiring
a Linux-specific solution. Considering the above and the advice on
LKML from a senior kernel hacker[1] to use fallocate() to solve this
problem, I figured that was the right way to go. But yeah, it also
seems to work with posix_fallocate() *on Linux*.

>>
>> #ifdef USE_DSM_POSIX
>> /*
>> + * Set the size of a virtual memory region associate with a file descriptor.
>> + * On Linux, also ensure that virtual memory is actually allocated by the
>> + * operating system to avoid nasty surprises later.
>> + *
>> + * Returns non-zero if either truncation or allocation fails, and sets errno.
>> + */
>> +static int
>> +dsm_impl_posix_resize(int fd, off_t size)
>> +{
>> + int rc;
>> +
>> + /* Truncate (or extend) the file to the requested size. */
>> + rc = ftruncate(fd, size);
>> +
>> +#ifdef HAVE_FALLOCATE
>> +#ifdef __linux__
>> + /*
>> + * On Linux, a shm_open fd is backed by a tmpfs file. After resizing
>> + * with ftruncate it may contain a hole. Accessing memory backed by a
>> + * hole causes tmpfs to allocate pages, which fails with SIGBUS if
>> + * there is no virtual memory available. So we ask tmpfs to allocate
>> + * pages here, so we can fail gracefully with ENOSPC now rather than
>> + * risking SIGBUS later.
>> + */
>> + if (rc == 0)
>> + {
>> + do
>> + {
>> + rc = fallocate(fd, 0, 0, size);
>> + } while (rc == -1 && errno == EINTR);
>> + if (rc != 0 && errno == ENOSYS)
>> + {
>> + /* Kernel too old (< 2.6.23). */
>> + rc = 0;
>> + }
>> + }
>> +#endif
>> +#endif
>
> I'd rather fall-back to just write() initializing the buffer, and do
> either of those in all implementations rather than just linux.

You can't write() to a shm_open() fd on FreeBSD. I can't find wording
in POSIX either way but it's not portable, and it's not too surprising
if you consider the spirit of POSIX shm_open() to be something like
"you might want to implement this as a special kind of file, or not"
but with the intention being to mmap it, not read/write it.

Note that I'm specifically addressing the freaky Linux SIGBUS problem
here: not virtual memory overcommit, which may or may not happen on
your particular OS leading to failures independent of this, and for
example does happen with FreeBSD. The way to avoid that on FreeBSD is
probably to disable overcommit (I haven't checked). The Linux SIGBUS
problem is not regular overcommit, it's tmpfs file holes and is a
different failure mode and AFAIK can bite you even if you disable
overcommit on Linux like TFM tells you to.

>> diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
>> index 7a05c7e5b85..dcb9a846c7b 100644
>> --- a/src/include/pg_config.h.in
>> +++ b/src/include/pg_config.h.in
>> @@ -167,6 +167,9 @@
>> /* Define to 1 if you have the <editline/readline.h> header file. */
>> #undef HAVE_EDITLINE_READLINE_H
>>
>> +/* Define to 1 if you have the `fallocate' function. */
>> +#undef HAVE_FALLOCATE
>> +
>> /* Define to 1 if you have the `fdatasync' function. */
>> #undef HAVE_FDATASYNC
>
> Needs pg_config.h.win32 adjustment...

Oops, didn't notice that we maintain that thing by hand. New version attached.

[1] https://lkml.org/lkml/2013/7/31/64

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
fallocate-v5.patch application/octet-stream 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jing Wang 2017-06-29 00:34:14 proposal: Support Unicode host variable in ECPG
Previous Message Andres Freund 2017-06-28 23:04:58 Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().