Re: Maximum transaction rate

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Marco Colombo <pgsql(at)esiway(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Maximum transaction rate
Date: 2009-03-18 01:56:24
Message-ID: alpine.GSO.2.01.0903172118150.14497@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 18 Mar 2009, Marco Colombo wrote:

> If you fsync() after each write you want ordered, there can't be any
> "subsequent I/O" (unless there are many different processes cuncurrently
> writing to the file w/o synchronization).

Inside PostgreSQL, each of the database backend processes ends up writing
blocks to the database disk, if they need to allocate a new buffer and the
one they are handed is dirty. You can easily have several of those
writing to the same 1GB underlying file on disk. So that prerequisite is
there. The main potential for a problem here would be if a stray
unsynchronized write from one of those backends happened in a way that
wasn't accounted for by the WAL+checkpoint design. What I was suggesting
is that the way that synchronization happens in the database provides some
defense from running into problems in this area.

The way backends handle writes themselves is also why your suggestion
about the database being able to utilize barriers isn't really helpful.
Those trickle out all the time, and normally you don't even have to care
about ordering them. The only you do need to care, at checkpoint time,
only a hard line is really practical--all writes up to that point, period.
Trying to implement ordered writes for everything that happened before
then would complicate the code base, which isn't going to happen for such
a platform+filesystem specific feature, one that really doesn't offer much
acceleration from the database's perspective.

> only when the journal wraps around there's a (extremely) small window of
> vulnerability. You need to write a careful crafted torture program to
> get any chance to observe that... such program exists, and triggers the
> problem

Yeah, I've been following all that. The PostgreSQL WAL design works on
ext2 filesystems with no journal at all. Some people even put their
pg_xlog directory onto ext2 filesystems for best performance, relying on
the WAL to be the journal. As long as fsync is honored correctly, the WAL
writes should be re-writing already allocated space, which makes this
category of journal mayhem not so much of a problem. But when I read
about fsync doing unexpected things, that gets me more concerned.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John DeSoi 2009-03-18 04:14:53 Re: Question Concerning PostgreSQL license.
Previous Message Marco Colombo 2009-03-18 00:59:18 Re: Maximum transaction rate