Re: Maximum transaction rate

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Maximum transaction rate
Date: 2009-03-18 00:59:18
Message-ID: 49C04766.1060503@esiway.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ron Mayer wrote:
> Greg Smith wrote:
>> There are some known limitations to Linux fsync that I remain somewhat
>> concerned about, independantly of LVM, like "ext3 fsync() only does a
>> journal commit when the inode has changed" (see
>> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ). The
>> way files are preallocated, the PostgreSQL WAL is supposed to function
>> just fine even if you're using fdatasync after WAL writes, which also
>> wouldn't touch the journal (last time I checked fdatasync was
>> implemented as a full fsync on Linux). Since the new ext4 is more
>
> Indeed it does.
>
> I wonder if there should be an optional fsync mode
> in postgres should turn fsync() into
> fchmod (fd, 0644); fchmod (fd, 0664);
> to work around this issue.

Question is... why do you care if the journal is not flushed on fsync?
Only the file data blocks need to be, if the inode is unchanged.

> For example this program below will show one write
> per disk revolution if you leave the fchmod() in there,
> and run many times faster (i.e. lying) if you remove it.
> This with ext3 on a standard IDE drive with the write
> cache enabled, and no LVM or anything between them.
>
> ==========================================================
> /*
> ** based on http://article.gmane.org/gmane.linux.file-systems/21373
> ** http://thread.gmane.org/gmane.linux.kernel/646040
> */
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main(int argc,char *argv[]) {
> if (argc<2) {
> printf("usage: fs <filename>\n");
> exit(1);
> }
> int fd = open (argv[1], O_RDWR | O_CREAT | O_TRUNC, 0666);
> int i;
> for (i=0;i<100;i++) {
> char byte;
> pwrite (fd, &byte, 1, 0);
> fchmod (fd, 0644); fchmod (fd, 0664);
> fsync (fd);
> }
> }
> ==========================================================
>

I ran the program above, w/o the fchmod()s.

$ time ./test2 testfile

real 0m0.056s
user 0m0.001s
sys 0m0.008s

This is with ext3+LVM+raid1+sata disks with hdparm -W1.
With -W0 I get:

$ time ./test2 testfile

real 0m1.014s
user 0m0.000s
sys 0m0.008s

Big difference. The fsync() there does its job.

The same program runs with a x3 slowdown with the fsyncs, but that's
expected, it's doing twice the writes, and in different places.

.TM.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2009-03-18 01:56:24 Re: Maximum transaction rate
Previous Message Jack W 2009-03-18 00:59:14 Question about Warm Standby