Re: Maximum transaction rate

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Ben Chobot <bench(at)silentmedia(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Jack Orenstein <jack(dot)orenstein(at)hds(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Maximum transaction rate
Date: 2009-03-13 17:44:21
Message-ID: 49BA9B75.3090304@esiway.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Marco Colombo <pgsql(at)esiway(dot)net> writes:
>> And I'm still wondering. The problem with LVM, AFAIK, is missing support
>> for write barriers. Once you disable the write-back cache on the disk,
>> you no longer need write barriers. So I'm missing something, what else
>> does LVM do to break fsync()?
>
> I think you're imagining that the disk hardware is the only source of
> write reordering, which isn't the case ... various layers in the kernel
> can reorder operations before they get sent to the disk.
>
> regards, tom lane

You mean some layer (LVM) is lying about the fsync()?

write(A);
fsync();
...
write(B);
fsync();
...
write(C);
fsync();

you mean that the process may be awakened after the first fsync() while
A is still somewhere in OS buffers and not sent to disk yet, so it's
possible that B gets to the disk BEFORE A. And if the system crashes,
A never hits the platters while B (possibly) does. Is it this you
mean by "write reodering"?

But doesn't this break any application with transactional-like behavior,
such as sendmail? The problem being 3rd parties, if sendmail declares
"ok, I saved the message" (*after* a fsync()) to the SMTP client,
it's actually lying 'cause the message hasn't hit the platters yet.
Same applies to IMAP/POP server, say. Well, it applies to anything
using fsync().

I mean, all this with disk caches in write-thru modes? It's the OS
lying, not the disks?

Wait, this breaks all journaled FSes as well, a DM device is just
a block device to them, if it's lying about synchronous writes the
whole purpose of the journal is defeated... I find it hard to
believe, I have to say.

.TM.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Davis 2009-03-13 18:00:19 Re: text column indexing in UTF-8 database
Previous Message Tom Lane 2009-03-13 17:13:10 Re: text column indexing in UTF-8 database