Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
Date: 2017-06-28 23:04:58
Message-ID: 20170628230458.n5ehizmvhoerr5yq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-06-28 19:07:50 +1200, Thomas Munro wrote:
> I think this line is saying that it won't restart automatically:
>
> https://github.com/torvalds/linux/blob/590dce2d4934fb909b112cd80c80486362337744/mm/shmem.c#L2884

Indeed.

> So I think we either need to mask signals with or put in an explicit
> retry loop, as shown in the attached version of the patch. With the
> v3 patch I posted earlier, I see interrupted system call failures in
> the select_parallel regression test, but with the v4 it passes.
> Thoughts?

I think a retry loop is a lot better than blocking signals.

> > Unfounded speculation: fallocate() might actually *improve*
> > performance of DSM segments if your access pattern involves random
> > access (just to pick an example out of the air, something like...
> > building a hash table), since it's surely easier to allocate a big
> > contiguous chunk than a squillion random pages most of which divide an
> > existing hole into two smaller holes...
>
> Bleugh... I retract this, of course we initialise the hash table in
> order anyway so this doesn't make any sense.

It's still faster to create larger mappings at once, rather than through
subsequent page faults...

> diff --git a/configure.in b/configure.in
> index 11eb9c8acfc..47452bbac43 100644
> --- a/configure.in
> +++ b/configure.in
> @@ -1429,7 +1429,7 @@ PGAC_FUNC_WCSTOMBS_L
> LIBS_including_readline="$LIBS"
> LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
>
> -AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
> +AC_CHECK_FUNCS([cbrt clock_gettime dlopen fallocate fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])

Why are we going for fallocate rather than posix_fallocate()? There's
plenty use cases that can only be done with the former, but this doesn't
seem like one of them?

Currently this is a linux specific path - but I don't actually see any
reason to keep it that way? It seems far from unlikely that this is just
an issue with linux...

>
> #ifdef USE_DSM_POSIX
> /*
> + * Set the size of a virtual memory region associate with a file descriptor.
> + * On Linux, also ensure that virtual memory is actually allocated by the
> + * operating system to avoid nasty surprises later.
> + *
> + * Returns non-zero if either truncation or allocation fails, and sets errno.
> + */
> +static int
> +dsm_impl_posix_resize(int fd, off_t size)
> +{
> + int rc;
> +
> + /* Truncate (or extend) the file to the requested size. */
> + rc = ftruncate(fd, size);
> +
> +#ifdef HAVE_FALLOCATE
> +#ifdef __linux__
> + /*
> + * On Linux, a shm_open fd is backed by a tmpfs file. After resizing
> + * with ftruncate it may contain a hole. Accessing memory backed by a
> + * hole causes tmpfs to allocate pages, which fails with SIGBUS if
> + * there is no virtual memory available. So we ask tmpfs to allocate
> + * pages here, so we can fail gracefully with ENOSPC now rather than
> + * risking SIGBUS later.
> + */
> + if (rc == 0)
> + {
> + do
> + {
> + rc = fallocate(fd, 0, 0, size);
> + } while (rc == -1 && errno == EINTR);
> + if (rc != 0 && errno == ENOSYS)
> + {
> + /* Kernel too old (< 2.6.23). */
> + rc = 0;
> + }
> + }
> +#endif
> +#endif

I'd rather fall-back to just write() initializing the buffer, and do
either of those in all implementations rather than just linux.

> diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
> index 7a05c7e5b85..dcb9a846c7b 100644
> --- a/src/include/pg_config.h.in
> +++ b/src/include/pg_config.h.in
> @@ -167,6 +167,9 @@
> /* Define to 1 if you have the <editline/readline.h> header file. */
> #undef HAVE_EDITLINE_READLINE_H
>
> +/* Define to 1 if you have the `fallocate' function. */
> +#undef HAVE_FALLOCATE
> +
> /* Define to 1 if you have the `fdatasync' function. */
> #undef HAVE_FDATASYNC

Needs pg_config.h.win32 adjustment...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-06-29 00:24:23 Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
Previous Message Alvaro Herrera 2017-06-28 22:52:14 Re: Race conditions with WAL sender PID lookups