Re: [mail] Re: Windows Build System

From: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>, "Vince Vielhaber" <vev(at)michvhf(dot)com>, "Ron Mayer" <ron(at)intervideo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [mail] Re: Windows Build System
Date: 2003-01-30 21:29:15
Message-ID: 200301301629.15141.lamar.owen@wgcr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday 30 January 2003 15:29, Tom Lane wrote:
> Lamar Owen <lamar(dot)owen(at)wgcr(dot)org> writes:
> > While I understand (and agree with) your (and Vince's) reasoning on why
> > Windows should be considered less reliable, neither of you have provided

> Windows shares none of that heritage. It is the first truly new port,
> onto a system without any Unix background, that we have ever done AFAIK.
> Claiming that it doesn't require an increased level of testing is
> somewhere between ridiculous and irresponsible.

I am saying that as we mature we need increased testing across the board. And
it is a very low percentage of code that is tied into the OS API, right? The
majority of the code (the vast majority) isn't touched by it.

> that we suspect there will be problems. And if you don't suspect
> there will be problems on Windows, you are being way too naive :-(

Reread my statement above. I _agree_ with the rationale -- but I fear it will
have the opposite impact. And I am not convinced that just because we have
good history with the unixoid ports means that we can slack on them -- Linux,
*BSD, etc all change. The strftime(3) breakage with RedHat of a cycle ago
should show us that much.

I suspect there will be problems on Win32 -- it is, after all, a new port.
But if we're going to immediately throw pathological test cases at it that
we're not even bothering to test against now, that immediately throws up a
flag to me. And TESTING IS BEING DONE on the Win32 port, nobody is yet
trying to put the PGDG blessing on it as yet, and progress is being made by
those who wish to see it made. It is still being touted as beta software,
right? The patches from Jan are very preliminary still, correct? Katie
hasn't issued a press release saying that it's not beta, right?

<hyperbole>
I don't see what the uproar is about, other than 'Win32 is so unstable that it
can't possibly work as well as you are seeing it work -- you must be doing
something wrong. Test it harder. Pull the plug repeatedly!! Test it until
it breaks! HA! Told you it would break! (yeah, firing up the old
oxyacetlyene torch and hitting the hard drive with a 6,000 degree flame did
the trick -- this has got to be a bad operating system!)'
</hyperbole>

And, by the way, who in their right mind tests a database server by repeated
yanking of the AC power? To go to that extreme for Win32 when we caution
against something as mundane as a kill -9 of postmaster on Unix is absurd.
And, yes, I know the difference. I also know that the AC power pull has
nothing to do with PostgreSQL, but it has to do with the OS under it.
Although a kill -9, from the point of view of the running process, is
identical to a power failure. It simply dies (unless it becomes a zombie, in
which case it is undead) either way. The effects of a kill -9 shouldn't be
as severe as a power fail, since the OS can properly flush written buffers
even after the process writing them has died.

And I also can point the finger at some Unix swervers (spelling intentional)
that would fail that test in a miserable way. I can also point at a few VMS
machines that couldn't pass that test. I've even seen machines blow up due
to improper power cycling.

And I've seen Win2k machines come right up after repeated power blips (I've
also seen them not come up).

It really depends upon what the hard disk is doing at the instant the
regulators drop out the 5 and 12V supplies (and which supply goes out first,
which can depend upon the respective loads -- for modern Pentium 4 systems
the 12V will probably go down first since it is more heavily loaded than the
5V supply in these systems). Under certain conditions where the 12V goes
down before the 5V does, the head might still be writing as the servo spirals
towards park, causing all manner of damage (maybe even to servo information,
which normally cannot be written). So the power cycle becomes a test of
hardware, too, played Russian Roulette-style.

Talk about an unscientific test.

A database server that needs that kind of testing is going to be hardened
hardware on a doubly redundant UPS anyway.

But, then again I've seen a Linux server survive a power cycle with no lost
data (ext3 filesystem -- I've seen lost data with ext2). And I've seen the
same server barf all over itself due to a single bit error in memory. Blew
out the entire root filesystem, which was journaled and residing on a RAID 1
partition (the corruption was perfectly mirrored, by the way). Serves me
right for not having ECC RAM installed at the time.

> If it passes the tests, good for it. I honestly do not expect that it
> will. My take on this is that we want to be able to document the
> problems in advance, rather than be blindsided.

I fully expect that Katie, Jan, Dave, and all the others working on this share
your concerns and want the Win32 port to be as solid as is possible on that
OS.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lamar Owen 2003-01-30 21:42:11 Re: Windows Build System - My final thoughts
Previous Message Tom Lane 2003-01-30 21:12:21 Re: Windows Build System - My final thoughts