mmap vs read/write

From: Huw Rogers <count0(at)fsj(dot)co(dot)jp>
To: hackers(at)postgreSQL(dot)org
Subject: mmap vs read/write
Date: 1998-05-15 20:26:53
Message-ID: 355CA50D.3CD4@fsj.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Someone posted a (readonly) benchtest of mmap vs
read/write I/O using the following code:

for (off = 0; 1; off += MMAP_SIZE)
{
addr = mmap(0, MMAP_SIZE, PROT_READ, 0, fd, off);
assert(addr != NULL);

for (j = 0; j < MMAP_SIZE; j++)
if (*(addr + j) != ' ')
spaces++;
munmap(addr,MMAP_SIZE);
}

This is unfair to mmap since mmap is called once
per page. Better to mmap large regions (many
pages at once), then use msync() to force
write any modified pages. Access purely in
memory mmap'd I/O is _many_ times faster than
read/write under Solaris or Linux later
than 2.1.99 (prior to 2.1.99, Linux had
slow mmap performance).

Limitation on mmap is mainly that you
can't map more than 2Gb of data at once
under most existing O.S.s, (including
heap and stack), so simplistic mapping
of entire DBMS data files doesn't
scale for large databases, and you
need to cache region mappings to
avoid running out of PTEs.

The need to collocate information in
adjacent pages could be why Informix has
clustered indexes, the internal structure
of which I'd like to know more about.

-Huw

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Internet Wire 1998-05-15 20:30:28 Internet Wire
Previous Message Oliver Elphick 1998-05-15 15:28:28 Re: CREATE DATABASE