Sounds good. Keep us posted. This will probably not make it into 7.2
but can be added to 7.3. We can perhaps conditionally use your code in
place of what is there. I have also looked at reducing the write() size
for WAL secondary writes. That will have to wait for 7.3 too because we
are so near beta.
> I have just completed the functional testing the WAL using mmap , it is
> working fine, I have tested by commenting out the "CreateCheckPoint "
> functionality so that
> when i kill the postgres and restart it will redo all the records from the
> WAL log file which
> is updated using mmap.
> Just i need to clean code and to do some stress testing.
> By the end of this week i should able to complete the stress test and
> generate the patch file .
> As Tom Lane mentioned i see the problem in portability to all platforms,
> what i propose is to use mmap for only WAL for some platforms like
> linux,freebsd etc . For other platforms we can use the existing method by
> slightly modifying the
> write() routine to write only the modified part of the page.
> > OK, I have talked to Tom Lane about this on the phone and we have a few
> > ideas.
> > Historically, we have avoided mmap() because of portability problems,
> > and because using mmap() to write to large tables could consume lots of
> > address space with little benefit. However, I perhaps can see WAL as
> > being a good use of mmap.
> > First, there is the issue of using mmap(). For OS's that have the
> > mmap() MAP_SHARED flag, different backends could mmap the same file and
> > each see the changes. However, keep in mind we still have to fsync()
> > WAL, so we need to use msync().
> > So, looking at the benefits of using mmap(), we have overhead of
> > different backends having to mmap something that now sits quite easily
> > in shared memory. Now, I can see mmap reducing the copy from user to
> > kernel, but there are other ways to fix that. We could modify the
> > write() routines to write() 8k on first WAL page write and later write
> > only the modified part of the page to the kernel buffers. The old
> > kernel buffer is probably still around so it is unlikely to require a
> > read from the file system to read in the rest of the page. This reduces
> > the write from 8k to something probably less than 4k which is better
> > than we can do with mmap.
> > I will add a TODO item to this effect.
> > As far as reducing the write to disk from 8k to 4k, if we have to
> > fsync/msync, we have to wait for the disk to spin to the proper location
> > and at that point writing 4k or 8k doesn't seem like much of a win.
> > In summary, I think it would be nice to reduce the 8k transfer from user
> > to kernel on secondary page writes to only the modified part of the
> > page. I am uncertain if mmap() or anything else will help the physical
> > write to the disk.
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> > + If your life is a hard drive, | 830 Blythe Avenue
> > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2001-10-01 15:48:03|
|Subject: Re: Preparation for Beta |
|Previous:||From: Tatsuo Ishii||Date: 2001-10-01 15:42:42|
|Subject: Re: Problem on AIX with current|