Re: BBU Cache vs. spindles

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Bruce Momjian <bruce(at)momjian(dot)us>, jd(at)commandprompt(dot)com, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, pgsql-performance(at)postgresql(dot)org, Ben Chobot <bench(at)silentmedia(dot)com>
Subject: Re: BBU Cache vs. spindles
Date: 2010-10-29 15:56:09
Message-ID: AANLkTi=ttpLEgYHTViATjbtS_dm-B5iRz3vOmaVDrw__@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance pgsql-www

On Fri, Oct 29, 2010 at 11:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Well, we COULD keep the data in shared buffers, and then copy it into
> an mmap()'d region rather than calling write(), but I'm not sure
> there's any advantage to it.  Managing address space mappings is a
> pain in the butt.

I could see this being a *theoretical* benefit in the case that the
background writer gains the ability to write out all blocks associated
with a file in order. In that case, you might get a win because you
could get a single mmap of the entire file, and just wholesale memcpy
blocks across, then sync/unmap it.

This, of course assumes a few things that must be for it to be per formant:
0) a list of blocks to be written grouped by files is readily available.
1) The pages you write to must be in the page cache, or your memcpy is
going to fault them in. With a plain write, you don't need the
over-written page in the cache.
2) Now, instead of the torn-page problem being FS block/sector sized
base, you can now actually have a possibly arbitrary amount of the
block memory written when the kernel writes out the page. you
*really* need full-page-writes.
3) The mmap overhead required for the kernel to setup the mappings is
less than the repeated syscalls of a simple write().

All those things seem like something that somebody could synthetically
benchmark to prove value before even trying to bolt into PostgreSQL.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan(at)highrise(dot)ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-10-29 15:57:06 Re: BBU Cache vs. spindles
Previous Message Robert Haas 2010-10-29 15:43:58 Re: BBU Cache vs. spindles

Browse pgsql-www by date

  From Date Subject
Next Message Tom Lane 2010-10-29 15:57:06 Re: BBU Cache vs. spindles
Previous Message Robert Haas 2010-10-29 15:43:58 Re: BBU Cache vs. spindles