Re: beta testing version

From: Ian Lance Taylor <ian(at)airs(dot)com>
To: alex(at)pilosoft(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: beta testing version
Date: 2000-12-01 08:30:57
Message-ID: 20001201083057.19944.qmail@daffy.airs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Date: Fri, 1 Dec 2000 01:54:23 -0500 (EST)
From: Alex Pilosov <alex(at)pilosoft(dot)com>

On Thu, 30 Nov 2000, Nathan Myers wrote:

> After a power outage on an active database, you may have corruption
> at low levels of the system, and unless you have enormous redundancy
> (and actually use it to verify everything) the corruption may go
> undetected and result in (subtly) wrong answers at any future time.
Nathan, why are you so hostile against postgres? Is there an ax to grind?

I don't think he is being hostile (I work with him, so I know that he
is generally pro-postgres).

The conditions under which WAL will completely recover your database:
1) OS guarantees complete ordering of fsync()'d writes. (i.e. having two
blocks A and B, A is fsync'd before B, it could NOT happen that B is on
disk but A is not).
2) on boot recovery, OS must not corrupt anything that was fsync'd.

Rule 1) is met by all unixish OSes in existance. Rule 2 is met by some
filesystems, such as reiserfs, tux2, and softupdates.

I think you are missing his main point, which he stated before, which
is that modern disk hardware is both smarter and stupider than most
people realize.

Some disks cleverly accept writes into a RAM cache, and return a
completion signal as soon as they have done that. They then feel free
to reorder the writes to magnetic media as they see fit. This
significantly helps performance. However, it means that all bets off
on a sudden power loss.

Your rule 1 is met at the OS level, but it is not met at the physical
drive level. The fact that the OS guarantees ordering of fsync()'d
writes means little since the drive is capable of reordering writes
behind the back of the OS.

At least with IDE, it is possible to tell the drive to disable this
sort of caching and reordering. However, GNU/Linux, at least, does
not do this. After all, doing it would hurt performance, and would
move us back to the old days when operating systems had to care a
great deal about disk geometry.

I expect that careful attention to the physical disks you purchase can
help you avoid these problems. For example, I would hope that EMC
disk systems handle power loss gracefully. But if you buy ordinary
off the shelf PC hardware, you really do need to arrange for a UPS,
and some sort of automatic shutdown if the UPS is running low.
Otherwise, although the odds are certainly with you, there is no 100%
guarantee that a busy database will survive a sudden power outage.

Ian

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Myers 2000-12-01 08:55:21 Re: beta testing version
Previous Message Nathan Myers 2000-12-01 08:21:26 Re: beta testing version