Re: BBU Cache vs. spindles

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Bruce Momjian <bruce(at)momjian(dot)us>, jd(at)commandprompt(dot)com, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, pgsql-performance(at)postgresql(dot)org, Ben Chobot <bench(at)silentmedia(dot)com>
Subject: Re: BBU Cache vs. spindles
Date: 2010-10-29 15:43:58
Message-ID: AANLkTimFfBZJ5YCn6tbGbQU_G4GdsQhXbtQ2kTEKfCVx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance pgsql-www

On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> James Mansion <james(at)mansionfamily(dot)plus(dot)com> writes:
>> Tom Lane wrote:
>>> The other and probably worse problem is that there's no application
>>> control over how soon changes to mmap'd pages get to disk.  An msync
>>> will flush them out, but the kernel is free to write dirty pages sooner.
>>> So if they're depending for consistency on writes not happening until
>>> msync, it's broken by design.  (This is one of the big reasons we don't
>>> use mmap'd space for Postgres disk buffers.)
>
>> Well, I agree that it sucks for the reason you give - but you use
>> write and that's *exactly* the same in terms of when it gets written,
>> as when you update a byte on an mmap'd page.
>
> Uh, no, it is not.  The difference is that we can update a byte in a
> shared buffer, and know that it *isn't* getting written out before we
> say so.  If the buffer were mmap'd then we'd have no control over that,
> which makes it mighty hard to obey the WAL "write log before data"
> paradigm.
>
> It's true that we don't know whether write() causes an immediate or
> delayed disk write, but we generally don't care that much.  What we do
> care about is being able to ensure that a WAL write happens before the
> data write, and with mmap we don't have control over that.

Well, we COULD keep the data in shared buffers, and then copy it into
an mmap()'d region rather than calling write(), but I'm not sure
there's any advantage to it. Managing address space mappings is a
pain in the butt.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Aidan Van Dyk 2010-10-29 15:56:09 Re: BBU Cache vs. spindles
Previous Message Igor Neyman 2010-10-29 14:38:47 Re: partitioning question 1

Browse pgsql-www by date

  From Date Subject
Next Message Aidan Van Dyk 2010-10-29 15:56:09 Re: BBU Cache vs. spindles
Previous Message Tom Lane 2010-10-28 21:26:17 Re: BBU Cache vs. spindles