Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: amul sul <sul_amul(at)yahoo(dot)co(dot)in>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
Date: 2016-08-12 20:26:11
Message-ID: CAEepm=24jFz-Bm6awX-U=qfvcxrbQVS1ZU6uUry07r=aGyePgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 13, 2016 at 2:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> amul sul <sul_amul(at)yahoo(dot)co(dot)in> writes:
>> When I am calling dsm_create on Linux using the POSIX DSM implementation can succeed, but result in SIGBUS when later try to access the memory. This happens because of my system does not have enough shm space & current allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can we use fallocate system call (i.e. Zero-fill the file) to ensure that all the file space has really been allocated, so that we don't later seg fault when accessing the memory mapping. But here we will endup by loop calling ‘write’ squillions of times.
>
> Wouldn't that just result in a segfault during dsm_create?
>
> I think probably what you are describing here is kernel misbehavior
> akin to memory overcommit. Maybe it *is* memory overcommit and can
> be turned off the same way. If not, you have material for a kernel
> bug fix/enhancement request.

I think this may be different from overcommit.

In dsm_impl_posix we do shm_open, then ftruncate. That creates a file
with a hole. Based on an LKML discussion where someone tried to
address this with a patch that was rejected[1], it believe that Linux
implements POSIX shmem as a tmpfs file and in this case the file has a
hole, which is not the same phenomenon as unallocated virtual memory
pages resulting from overcommit policy.

In dsm_impl_mmap it looks like we have code to deal with the same
problem: we do open, then, ftruncate, and then we explicitly write a
bunch of zeros to the file, with this comment:

/*
* Zero-fill the file. We have to do this the hard way to ensure that
* all the file space has really been allocated, so that we don't
* later seg fault when accessing the memory mapping. This is pretty
* pessimal.
*/

Maybe we didn't do that for dsm_impl_posix because maybe you can't
write to a fd created with shm_open like that, I don't know. But it
looks like if we used fallocate or posix_fallocate in the
dsm_impl_posix case we'd get a nice ESPC error, instead of
success-but-later-SIGBUS-on-access. Whether there is *also* the
possibility of overcommit biting you later I don't know, but I suspect
that's an independent problem. The OOM killer kills you with SIGKILL,
not SIGBUS.

[1] https://lkml.org/lkml/2013/7/31/64

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martín Marqués 2016-08-12 21:58:26 pg_dump with tables created in schemas created by extensions
Previous Message Tom Lane 2016-08-12 20:26:07 Re: No longer possible to query catalogs for index capabilities?