Quick Links

Re: Shared buffers, db transactions commited, and write IO on Solaris

From:	Dimitri <dimitrik(dot)fr(at)gmail(dot)com>
To:	"Erik Jones" <erik(at)myemma(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL Performance" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Shared buffers, db transactions commited, and write IO on Solaris
Date:	2007-03-29 22:15:11
Message-ID:	5482c80a0703291515i7c05ba28l6f7240ebb5d2b837@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

> >>
> > Erik,
> >
> > using 'forcedirectio' simply brings your write operations to the
> > *real* volume - means while you need to write 10 bytes you'll write 10
> > bytes (instead of UFS block size (8K)). So it explains me why your
> > write volume became slower.

I men 'lower' (not slower)

>
> Sorry, that's not true. Google "ufs forcedirectio" go to the first
> link and you will find:
>
> "forcedirectio
>
> The forcedirectio (read "force direct IO") UFS option causes data to
> be buffered in kernel address whenever data is transferred between
> user address space and the disk. In other words, it bypasses the file
> system cache. For certain types of applications -- primarily database
> systems -- this option can dramatically improve performance. In fact,
> some database experts have argued that a file using the forcedirectio
> option will outperform a raw partition, though this opinion seems
> fairly controversial.
>
> The forcedirectio improves file system performance by eliminating
> double buffering, providing a small, efficient code path for file
> system reads and writes and removing pressure on memory."

Erik, please, don't take me wrong, but reading Google (or better man pages)
don't replace brain and basic practice... Direct IO option is not a silver
bullet which will solve all your problems (try to do 'cp' on the mounted in
'forcedirectio' filesystem, or use your mailbox on it - you'll quickly
understand impact)...

>
> However, what this does mean is that writes will be at the actual
> filesystem block size and not the cache block size (8K v. 512K).

while UFS filesystem mounted normally, it uses its own cache for all
operations (read and write) and saves data modifications on per
page basis, means: when a process writes 200 bytes there will be 200
bytes modified in cache, then whole page is written (8K) once data
demanded to be flushed (and WAL is writing per each commit)...

Now, mounted with 'forcedirectio' option UFS is free of page size constraint
and will write like a raw device an exactly demanded amount of data, means:
when a process writes 200 bytes it'll write exactly 200 bytes to the disk. For
WAL it may be very benefit, because you'll be able to perform more I/O
operations/sec, means more commit/sec. But on the same time it may
dramatically slow down all your read operations (no more data prefetch
nor dynamic cache)... The best solution probably is to separate WAL from
data (BTW, it'll be nice to have such an option as WAL_PATH in conf file),
it may be resolved by simple use of tablespace or at least directory links, etc.
But if your major activity is writing - probably it's already ok for you.

However, to understand TX number mystery I think the only possible solution
is to reproduce a small live test:

(I'm sure you're aware you can mount/unmount forcedirectio dynamically?)

during stable workload do:

# mount -o remount,logging /path_to_your_filesystem

and check if I/O volume is increasing as well TX numbers
than come back:

# mount -o remount,forcedirectio /path_to_your_filesystem

and see if I/O volume is decreasing as well TX numbers...

Best regards!
-Dimitri

>
> >
> > Now, why TX number is reduced - is a small mystery :)
> >
> > Options:
> > - you really do 10 times less commits, means you work 10 times
> > slower? ;)
> > what about users? how do you measure your work performance?
>
> We are an email marketing service provider with a web front end
> application. We measure work performance via web requests (counts,
> types, etc...), mailer activity and the resulting database activity.
> We are doing as much or more work now than previously, and faster.
>
> >
> > - TX reported in pg_* tables are not exact, but I don't believe
> > at all :)
>
> Even if they aren't exact, being off by a factor of 10 wouldn't be
> believable. the forcedirectio mount setting for ufs can definitely
> explain the drop in data written volume, but doesn't do much to
> explain the difference in xact commits.
>
> erik jones <erik(at)myemma(dot)com>
> software developer
> 615-296-0838
> emma(r)
>
>
>
>

In response to

Re: Shared buffers, db transactions commited, and write IO on Solaris at 2007-03-29 18:58:13 from Erik Jones

Responses

Re: Shared buffers, db transactions commited, and write IO on Solaris at 2007-03-30 00:23:23 from Josh Berkus
Re: Shared buffers, db transactions commited, and write IO on Solaris at 2007-03-30 05:22:52 from Erik Jones

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Dave Dutcher	2007-03-29 23:12:58	Re: Weird performance drop
Previous Message	Erik Jones	2007-03-29 19:49:48	Re: Shared buffers, db transactions commited, and write IO on Solaris