Re: [mail] Re: Windows Build System

From: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [mail] Re: Windows Build System
Date: 2003-01-30 22:30:54
Message-ID: 200301301730.54409.lamar.owen@wgcr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday 30 January 2003 16:54, Tom Lane wrote:
> Lamar Owen <lamar(dot)owen(at)wgcr(dot)org> writes:
> > And, by the way, who in their right mind tests a database server by
> > repeated yanking of the AC power?

> Anybody who would like their data to survive a power outage.

I don't buy that. That's why I have $36,000 worth of lead acid in the room
next door, with $5,000 of inverters and chargers in the server room. Until I
had to upgrade RAM I had 240+ days of uptime on one box. The longest power
interruption was 28 hours. The battery held the whole time. There was never
more than 30 days between interruptions. The last time I had the server
actually power down was during a maintenance run on the inverter/charge
system, and I had to transfer power to the servers onto another branch,
necessitating two power cycles, which were clean shutdown/reboots. I haven't
had an unscheduled dirty powerdown in two years.

We cannot on any system guarantee the data surviving a sudden power outage.
Until we can be certain the write-back cache on that high performance drive
(or NAS array using iSCSI, perhaps) flushes we cannot know the data hit the
disks.

> > To go to that extreme for Win32 when we caution
> > against something as mundane as a kill -9 of postmaster on Unix is
> > absurd. And, yes, I know the difference. I also know that the AC power
> > pull has nothing to do with PostgreSQL, but it has to do with the OS
> > under it. Although a kill -9, from the point of view of the running
> > process, is identical to a power failure.

> No, it is not. Did you not read my comments earlier today?

Of course I did -- I'm not daft. And that's why I specified 'from the point
of view of the running process' -- that is, the process you are SIGKILLing
cannot itself determine the difference between the power cycle and SIGKILL.
It just simply goes down, hard. Of course there is:

> I forgot to mention one of the biggest
> headaches, which is that kill -9 the postmaster doesn't kill the child
> backends.

This is a real difference, and one that I forgot as well. So SIGKILL is
different to the whole backend system, but not to the singular process that
is being SIGKILL'd. Suppose I issue a SIGKILL to postmaster and all forked
backends simultaneously? Where does SIGKILL differ from a power failure from
the point of view of the database system in that scenario? This is also
assuming that you clean reboot the OS after the SIGKILL to postmaster, as
there is that dynamic state you mentioned to worry about. I probably should
have mentioned that before.

> Windows
> is going to bring a whole new set of failure modes that we don't have
> defenses for. (Yet.) *That* is what we need extensive testing to learn
> about, and claiming that we are discriminating against Windows just
> because it's Windows misses the point completely.

And ISTM that an experienced Windows developer, such as Katie or Dave, would
know to do this, would know how to do this, and would know the best way of
doing this. And I wasn't singling you out, Tom. It was the whole thread and
the turns it took that got me rather upset.

> Or, if you prefer, we can ship Postgres 7.4 for Windows with no more
> testing than we need for any of the existing, long-since-well-tested
> ports. But I'll bet a great deal that our reputation will go down the
> drain (along with many people's data) if we do that.

We don't have a standard testing methodology for any of our ports. We need
one for all of our ports. I fully expect the Win32 port to need a different
methodology than the FreeBSD port or the Linux port. And I expect we have
enough experienced Win32 developers (which I am not) here that can provide
insight into how the methodologies should differ.

I prefer more extensive testing for all of our ports. You did read that when
I wrote it, right? (When I wrote it multiple times....) Just saying 'it
passed regression' shouldn't be enough -- but we should really spend some
cycles thinking about what the test suite really should be. For all
platforms.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Copeland 2003-01-30 22:37:22 Re: [mail] Re: Windows Build System
Previous Message Kevin Brown 2003-01-30 22:04:00 Re: [PATCHES] v7.2.4 bundled ...