Re: [PATCH] 2PC state files on shared memory

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] 2PC state files on shared memory
Date: 2009-08-08 13:44:36
Message-ID: 603c8f070908080644n5f5417bbve956d4ac73472eff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 8, 2009 at 9:31 AM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Tom Lane wrote:
>> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>>> Based on an idea of Heikki Linnakangas, here is a patch in order to improve
>>> 2PC
>>> by sending the state files of prepared transactions to shared memory instead
>>> of disk.
>>
>> I don't understand how this can possibly work.  The entire point of
>> 2PC is that the state file is guaranteed to be on disk so it will
>> survive a crash.  What good is it if it's in shared memory?
>
> The state files are not fsync'd when they're written, but a copy is
> written to WAL so that it can be replayed on crash. With this patch,
> it's still written to WAL, but the write to a file on disk is skipped,
> and it's stored in shared memory instead.
>
>> Quite aside from that, the fixed size of shared memory makes this seem
>> pretty impractical.
>
> Most state files are small. If one doesn't fit in the area reserved for
> this, it's written to disk as usual. It's just an optimization.
>
> I'm a bit disappointed by the performance gains. I would've expected
> more, given a decent battery-backed-up cache to buffer the WAL fsyncs.
> But it looks like they're still causing the most overhead, even with a
> battery-backed-up cache.

It doesn't seem that surprising to me that a write to shared memory
and a write to an un-fsync'd file would be about the same speed. The
file write will eventually generate some I/O when it goes to disk, but
at the time you make the system call it's basically just a memory
copy.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2009-08-08 14:02:54 Re: Split-up ECPG patches
Previous Message Robert Haas 2009-08-08 13:39:48 Re: GEQO vs join order restrictions