Re: [HACKERS] request for tuning suggestions from PC Week Labs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Timothy Dyck <Timothy_Dyck(at)zd(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] request for tuning suggestions from PC Week Labs
Date: 2000-02-01 02:37:32
Message-ID: 12313.949372652@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Timothy Dyck <Timothy_Dyck(at)zd(dot)com> writes:
> 1. The biggest performance item I've seen in looking through the mailing
> lists is the fsync option. I want to leave this enabled as I don't think a
> transactional database should ever lose data. My understanding is that
> with it on PG checkpoints after every commit. Is there a way to let the
> log grow to a certain size before checkpointing? When fsync is off, how is
> data loss possible?

With fsync on, pgsql does fsync() after every write, which essentially
means you get zero overlap of computation and I/O. Horribly
inefficient.

With fsync off, we don't do the fsync() call. The data is still
pushed out to the Unix kernel at the same times, but the kernel's disk
scheduler has discretion about what order the disk pages actually get
sent to disk in. Also, you get fewer physical writes when several
successive transactions modify the same disk page. On most Unixes this
makes for a vast performance improvement.

The risk scenario here is that the pg_log update saying that your
transaction has committed might get physically written out before
the data pages that contain the actual tuples do. We write the
pg_log page last, of course, but the kernel might reorder the physical
writes.

If the pg_log update gets written, and some but not all of the updated
data pages have been written, and you suffer a system crash, then after
reboot it appears that some but not all of the changes made by your
transaction have "stuck". That counts as data corruption for most
applications.

Note that I'm talking about an actual system crash: power failure,
hardware failure, or kernel failure. A crash of the Postgres backend
does *not* create this hazard. Also note that a crash does not create
a corruption hazard unless pg_log says that the incomplete transaction
committed.

My feeling is that if you have a UPS and a reliable kernel, there is
no meaningful reliability benefit from keeping fsync on --- certainly
not enough to justify the performance hit. The above risk analysis
ignores non-software risk issues, when in fact there are big risks
at the hardware level. One of the more obvious ones is that modern
disk drives do a certain amount of traffic reordering themselves.
If your drive acts like that, I don't see that fsync buys anything
at all, except perhaps protection against kernel crashes. My experience
(on HPUX) is that the kernel's MTBF is little worse than the disk drive's,
so I don't use fsync. YMMV.

> 2. Can I move the log to a different spindle from the disks the
> database data is on? The manuals seem to indicate the log is actually
> part of the datafile itself, which would imply it can't be moved
> elsewhere.

You could move pg_log to a different drive but it probably wouldn't buy
much. pg_log only contains a commit/no commit flag for each
transaction, not copies of data, so there's not that much traffic there.

People have reported seeing wins from moving indexes on big tables to
separate drives. We don't currently have any nice neat GUI for that,
but you can kluge it with a few symbolic links. Better methods are
under discussion...

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Lockhart 2000-02-01 02:37:35 Re: [HACKERS] Re: Case-folding bogosity in new psql
Previous Message Bruce Momjian 2000-02-01 02:34:43 Re: [HACKERS] freefuncs.c is never called from anywhere!?