Skip site navigation (1) Skip section navigation (2)


From: Janardhana Reddy <jana-reddy(at)mediaring(dot)com(dot)sg>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, janareddy <jana-reddy(at)mediaring(dot)com(dot)sg>
Date: 2001-10-01 09:57:04
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
     I have just  completed the functional testing  the WAL using mmap  , it is

 working  fine,  I  have tested  by commenting out the  "CreateCheckPoint "
functionality so that
   when i kill the postgres and restart it will redo all the records from the
WAL log file  which
  is updated  using mmap.
     Just i need  to  clean code and to do some stress testing.
 By the end of this week i should able to  complete  the stress test  and
generate the patch file .
    As Tom Lane mentioned  i see the  problem in portability  to all platforms,

      what i propose is to use mmap for only WAL  for some platforms like
  linux,freebsd etc . For  other platforms we can use the existing method by
slightly modifying the
 write()  routine to write only the modified part of the page.


> OK, I have talked to Tom Lane about this on the phone and we have a few
> ideas.
> Historically, we have avoided mmap() because of portability problems,
> and because using mmap() to write to large tables could consume lots of
> address space with little benefit.  However, I perhaps can see WAL as
> being a good use of mmap.
> First, there is the issue of using mmap().  For OS's that have the
> mmap() MAP_SHARED flag, different backends could mmap the same file and
> each see the changes.  However, keep in mind we still have to fsync()
> WAL, so we need to use msync().
> So, looking at the benefits of using mmap(), we have overhead of
> different backends having to mmap something that now sits quite easily
> in shared memory.  Now, I can see mmap reducing the copy from user to
> kernel, but there are other ways to fix that.  We could modify the
> write() routines to write() 8k on first WAL page write and later write
> only the modified part of the page to the kernel buffers.  The old
> kernel buffer is probably still around so it is unlikely to require a
> read from the file system to read in the rest of the page.  This reduces
> the write from 8k to something probably less than 4k which is better
> than we can do with mmap.
> I will add a TODO item to this effect.
> As far as reducing the write to disk from 8k to 4k, if we have to
> fsync/msync, we have to wait for the disk to spin to the proper location
> and at that point writing 4k or 8k doesn't seem like much of a win.
> In summary, I think it would be nice to reduce the 8k transfer from user
> to kernel on secondary page writes to only the modified part of the
> page.  I am uncertain if mmap() or anything else will help the physical
> write to the disk.
> --
>   Bruce Momjian                        |
>   pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

In response to


pgsql-hackers by date

Next:From: Teodor SigaevDate: 2001-10-01 10:20:33
Subject: Current CVS: compilation error
Previous:From: Reinoud van LeeuwenDate: 2001-10-01 09:39:45
Subject: Re: What executes faster?

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group