ext4 finally doing the right thing

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: ext4 finally doing the right thing
Date: 2010-01-16 03:05:49
Message-ID: 4B512D0D.4030909@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

A few months ago the worst of the bugs in the ext4 fsync code started
clearing up, with
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f3481e9a80c240f169b36ea886e2325b9aeb745
as a particularly painful one. That made it into the 2.6.32 kernel
released last month. Some interesting benchmark news today suggests a
version of ext4 that might actually work for databases is showing up in
early packaged distributions:

http://www.phoronix.com/scan.php?page=article&item=ubuntu_lucid_alpha2&num=3

Along with the massive performance drop that comes from working fsync.
See
http://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=2
for background about this topic from when the issue was discovered:

"[This change] is required for safe behavior with volatile write caches
on drives. You could mount with -o nobarrier and [the performance drop]
would go away, but a sequence like write->fsync->lose power->reboot may
well find your file without the data that you synced, if the drive had
write caches enabled. If you know you have no write cache, or that it
is safely battery backed, then you can mount with -o nobarrier, and not
incur this penalty."

The pgbench TPS figure Phoronix has been reporting has always been a
fictitious one resulting from unsafe write caching. With 2.6.32
released with ext4 defaulting to proper behavior on fsync, that's going
to make for a very interesting change. On one side, we might finally be
able to use regular drives with their caches turned on safely, taking
advantage of the cache for other writes while doing the right thing with
the database writes. On the other, anyone who believed the fictitious
numbers before is going to be in a rude surprise and think there's a
massive regression here. There's some potential for this to show
PostgreSQL in a bad light, when people discover they really only can get
~100 commits/second out of cheap hard drives and assume the database is
to blame. Interesting times.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.co

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2010-01-16 03:59:57 Re: a heavy duty operation on an "unused" table kills my server
Previous Message Greg Smith 2010-01-16 02:25:43 Re: a heavy duty operation on an "unused" table kills my server