Re: [HACKERS] mmap and MAP_ANON

From: ocie(at)paracel(dot)com
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane)
Cc: hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] mmap and MAP_ANON
Date: 1998-05-13 18:38:42
Message-ID: 9805131838.AA05684@dolomite.paracel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>
> "Gran Thyni" <goran(at)bildbasen(dot)se> writes:
> > Linux can only MAP_SHARED if the file is a *real* file,
> > devices or trick like MAP_ANON does only work with MAP_PRIVATE.
>
> Well, this makes some sense: MAP_SHARED implies that the shared memory
> will also be accessible to independently started processes, and
> to do that you have to have an openable filename to refer to the
> data segment by.
>
> MAP_PRIVATE will *not* work for our purposes: according to my copy
> of mmap(2):
>
> : If MAP_PRIVATE is set in flags:
> : o Modification to the mapped region by the calling process is
> : not visible to other processes which have mapped the same
> : region using either MAP_PRIVATE or MAP_SHARED.
> : Modifications are not visible to descendant processes that
> : have inherited the mapped region across a fork().
>
> so privately mapped segments are useless for interprocess communication,
> even after we get rid of exec().
>
> mmaping /dev/zero, as has been suggested earlier in this thread,
> seems like a really bad idea to me. Would that not imply that
> any process anywhere in the system that also decides to mmap /dev/zero
> would get its hands on the Postgres shared memory segment? You
> can't restrict permissions on /dev/zero to prevent it.

On some systems, mmaping /dev/zero can be shared with child processes
as in this example:

#include <sys/types.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>

int main()
{
int fd;
caddr_t ma;
int i;
int pagesize = sysconf(_SC_PAGESIZE);

fd=open("/dev/zero",O_RDWR);
if (fd==-1) {
perror("open");
exit(1);
}

ma=mmap((caddr_t) 0,
pagesize,
(PROT_READ|PROT_WRITE),
MAP_SHARED,
fd,
0);

if ((int)ma == -1) {
perror("mmap");
exit(1);
}

memset(ma,0,pagesize);

i=fork();

if (i==-1) {
perror("fork");
exit(1);
}

if (i==0) { /* child */
((char*)ma)[0]=1;
sleep(1);
printf("child %d %d\n",((char*)ma)[0],((char*)ma)[1]);
sleep(1);
return 0;
} else { /* parent */
((char*)ma)[1]=1;
sleep(1);
printf("parent %d %d\n",((char*)ma)[0],((char*)ma)[1]);
}

wait(NULL);
munmap(ma,pagesize*10);

return 0;
}

This works on Solaris and as expected, both the parent and child are
able to write into the memory and their changes are honored (the
memory is truely shared between processes. We can certainly map a
real file, and this might even give us some interesting crash recovery
options. The nice thing about doing away with the exec is that the
memory mapped in the parent process is avalible at the same address
region in every process, so we don't have to do funky pointer tricks.

The only problem I see with mmap is that we don't know exactly when a
page will be written to disk. I.E. If you make two writes, the page
might get sync'ed between them, thus storing an inconsistant
intermediate state to the disk. Perhaps with proper transaction
control, this is not a problem.

The question is should the individual database files be mapped into
memory, or should one "pgmem" file be mapped, with pages from
different files read into it. The first option would allow different
backend processes to map different pages of different files as they
are needed. The postmaster could "pre-map" pages on behalf of the
backend processes as sort of an inteligent read-ahead mechanism.

I'll try to write this seperate from Postgres just to see how it works.

Ocie

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michal Mosiewicz 1998-05-13 20:26:55 Re: [HACKERS] mmap and MAP_ANON
Previous Message Bruce Momjian 1998-05-13 18:06:04 Re: [HACKERS] Re: [QUESTIONS] money or dollar type