Re: 2nd Level Buffer Cache

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 2nd Level Buffer Cache
Date: 2011-03-22 21:28:02
Message-ID: 201103222228.03265.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> Monday 21 March 2011 20:58:16
> On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> > On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>
wrote:
> >> Can't you make just one large mapping and lock it in 8k regions? I
> >> thought the problem with mmap was not being able to detect other
> >> processes
> >> (http://www.mail-archive.com/pgsql-general(at)postgresql(dot)org/msg122301.htm
> >> l) compatibility issues (possibly obsolete), etc.
> >
> > I was assuming that locking part of a mapping would force the kernel
> > to split the mapping. It has to record the locked state somewhere so
> > it needs a data structure that represents the size of the locked
> > section and that would, I assume, be the mapping.
> >
> > It's possible the kernel would not in fact fall over too badly doing
> > this. At some point I'll go ahead and do experiments on it. It's a bit
> > fraught though as it the performance may depend on the memory
> > management features of the chipset.
> >
> > That said, that's only part of the battle. On 32bit you can't map the
> > whole database as your database could easily be larger than your
> > address space. I have some ideas on how to tackle that but the
> > simplest test would be to just mmap 8kB chunks everywhere.
>
> Even on 64 bit systems you only have 48 bit address space which is not
> a theoretical limitation. However, at least on linux you can map in
> and map out pretty quick (10 microseconds paired on my linux vm) so
> that's not so big of a deal. Dealing with rapidly growing files is a
> problem. That said, probably you are not going to want to reserve
> multiple gigabytes in 8k non contiguous chunks.
>
> > But it's worse than that. Since you're not responsible for flushing
> > blocks to disk any longer you need some way to *unlock* a block when
> > it's possible to be flushed. That means when you flush the xlog you
> > have to somehow find all the blocks that might no longer need to be
> > locked and atomically unlock them. That would require new
> > infrastructure we don't have though it might not be too hard.
> >
> > What would be nice is a mlock_until() where you eventually issue a
> > call to tell the kernel what point in time you've reached and it
> > unlocks everything older than that time.
>
> I wonder if there is any reason to mlock at all...if you are going to
> 'do' mmap, can't you just hide under current lock architecture for
> actual locking and do direct memory access without mlock?
>
> merlin

Actually after dealing with mmap and adding munmap I found crucial thing why
to not use mmap:
You need to munmap, and for me this takes much time, even if I read with
SHARED | PROT_READ, it's looks like Linux do flush or something else, same as
with MAP_FIXED, MAP_PRIVATE, etc.

Regards,
Radek

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-03-22 21:44:04 Re: Flex output missing from 9.1a4 tarballs?
Previous Message erdinc.akkaya 2011-03-22 21:16:09 Re: GSoC 2011 - Mentors? Projects?