Re: [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...

From: Sean Chittenden <sean(at)chittenden(dot)org>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
Date: 2003-03-07 06:04:12
Message-ID: 20030307060412.GA19138@perrin.int.nxad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-performance

> > I don't have my copy of Steven's handy (it's some 700mi away atm
> > otherwise I'd cite it), but if Tom or someone else has it handy, look
> > up the example re: the performance gain from read()'ing an mmap()'ed
> > file versus a non-mmap()'ed file. The difference is non-trivial and
> > _WELL_ worth the time given the speed increase.
>
> Can anyone confirm this? If so, one easy step we could take in this
> direction would be adapting COPY FROM to use mmap().

Weeee! Alright, so I got to have some fun writing out some simple
tests with mmap() and friends tonight. Are the results interesting?
Absolutely! Is this a simple benchmark? Yup. Do I think it
simulates PostgreSQL? Eh, not particularly. Does it demonstrate that
mmap() is a win and something worth implementing? I sure hope so. Is
this a test program to demonstrate the ideal use of mmap() in
PostgreSQL? No. Is it a place to start a factual discussion? I hope
so.

I have here four tests that are conditionalized by cpp.

# The first one uses read() and write() but with the buffer size set
# to the same size as the file.
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -o test-mmap test-mmap.c
/usr/bin/time ./test-mmap > /dev/null
Beginning tests with file: services

Page size: 4096
File read size is the same as the file size
Number of iterations: 100000
Start time: 1047013002.412516
Time: 82.88178

Completed tests
82.09 real 2.13 user 68.98 sys

# The second one uses read() and write() with the default buffer size:
# 65536
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAULT_READSIZE=1 -o test-mmap test-mmap.c
/usr/bin/time ./test-mmap > /dev/null
Beginning tests with file: services

Page size: 4096
File read size is default read size: 65536
Number of iterations: 100000
Start time: 1047013085.16204
Time: 18.155511

Completed tests
18.16 real 0.90 user 14.79 sys
# Please note this is significantly faster, but that's expected

# The third test uses mmap() + madvise() + write()
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAULT_READSIZE=1 -DDO_MMAP=1 -o test-mmap test-mmap.c
/usr/bin/time ./test-mmap > /dev/null
Beginning tests with file: services

Page size: 4096
File read size is the same as the file size
Number of iterations: 100000
Start time: 1047013103.859818
Time: 8.4294203644

Completed tests
7.24 real 0.41 user 5.92 sys
# Faster still, and twice as fast as the normal read() case

# The last test only calls mmap()'s once when the file is opened and
# only msync()'s, munmap()'s, close()'s the file once at exit.
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAULT_READSIZE=1 -DDO_MMAP=1 -DDO_MMAP_ONCE=1 -o test-mmap test-mmap.c
/usr/bin/time ./test-mmap > /dev/null
Beginning tests with file: services

Page size: 4096
File read size is the same as the file size
Number of iterations: 100000
Start time: 1047013111.623712
Time: 1.174076

Completed tests
1.18 real 0.09 user 0.92 sys
# Substantially faster

Obviously this isn't perfect, but reading and writing data is faster
(specifically moving pages through the VM/OS). Doing partial writes
from mmap()'ed data should be faster along with scanning through
mmap()'ed portions of - or completely mmap()'ed - files because the
pages are already loaded in the VM. PostgreSQL's LRU file descriptor
cache could easily be adjusted to add mmap()'ing of frequently
accessed files (specifically, system catalogs come to mind). It's not
hard to figure out how often particular files are accessed and to
either _avoid_ mmap()'ing a file that isn't accessed often, or to
mmap() files that _are_ accessed often. mmap() does have a cost, but
I'd wager that mmap()'ing the same file a second or third time from a
different process would be more efficient. The speedup of searching
through an mmap()'ed file may be worth it, however, to mmap() all
files if the system is under a tunable resource limit
(max_mmaped_bytes?).

If someone is so inclined or there's enough interest, I can reverse
this test case so that data is written to an mmap()'ed file, but the
same performance difference should hold true (assuming this isn't a
write to a tape drive ::grin::).

The URL for the program used to generate the above tests is at:

http://people.freebsd.org/~seanc/mmap_test/

Please ask if you have questions. -sc

--
Sean Chittenden

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2003-03-07 14:29:46 Re: [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
Previous Message Bruce Momjian - CVS 2003-03-07 05:49:12 pgsql-server/doc FAQ src/FAQ/FAQ.html

Browse pgsql-performance by date

  From Date Subject
Next Message Mr.F 2003-03-07 06:47:32 Re: New Interface for Win
Previous Message Aspire Something 2003-03-07 05:33:14 Re: New Interface for Win