Re: PERFORMANCE IMPROVEMENT by mapping WAL FILES

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Janardhana Reddy <jana-reddy(at)mediaring(dot)com(dot)sg>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PERFORMANCE IMPROVEMENT by mapping WAL FILES
Date: 2001-10-12 17:35:25
Message-ID: 200110121735.f9CHZPN09243@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I have added this to TODO.detail/mmap.

> I have just completed the functional testing the WAL using mmap , it is
>
> working fine, I have tested by commenting out the "CreateCheckPoint "
> functionality so that
> when i kill the postgres and restart it will redo all the records from the
> WAL log file which
> is updated using mmap.
> Just i need to clean code and to do some stress testing.
> By the end of this week i should able to complete the stress test and
> generate the patch file .
> As Tom Lane mentioned i see the problem in portability to all platforms,
>
> what i propose is to use mmap for only WAL for some platforms like
> linux,freebsd etc . For other platforms we can use the existing method by
> slightly modifying the
> write() routine to write only the modified part of the page.
>
> Regards
> jana
>
> >
> >
> > OK, I have talked to Tom Lane about this on the phone and we have a few
> > ideas.
> >
> > Historically, we have avoided mmap() because of portability problems,
> > and because using mmap() to write to large tables could consume lots of
> > address space with little benefit. However, I perhaps can see WAL as
> > being a good use of mmap.
> >
> > First, there is the issue of using mmap(). For OS's that have the
> > mmap() MAP_SHARED flag, different backends could mmap the same file and
> > each see the changes. However, keep in mind we still have to fsync()
> > WAL, so we need to use msync().
> >
> > So, looking at the benefits of using mmap(), we have overhead of
> > different backends having to mmap something that now sits quite easily
> > in shared memory. Now, I can see mmap reducing the copy from user to
> > kernel, but there are other ways to fix that. We could modify the
> > write() routines to write() 8k on first WAL page write and later write
> > only the modified part of the page to the kernel buffers. The old
> > kernel buffer is probably still around so it is unlikely to require a
> > read from the file system to read in the rest of the page. This reduces
> > the write from 8k to something probably less than 4k which is better
> > than we can do with mmap.
> >
> > I will add a TODO item to this effect.
> >
> > As far as reducing the write to disk from 8k to 4k, if we have to
> > fsync/msync, we have to wait for the disk to spin to the proper location
> > and at that point writing 4k or 8k doesn't seem like much of a win.
> >
> > In summary, I think it would be nice to reduce the 8k transfer from user
> > to kernel on secondary page writes to only the modified part of the
> > page. I am uncertain if mmap() or anything else will help the physical
> > write to the disk.
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> > + If your life is a hard drive, | 830 Blythe Avenue
> > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-10-12 17:46:03 Re: HISTORY
Previous Message Bruce Momjian 2001-10-12 17:22:54 Re: optimizer question