Skip site navigation (1) Skip section navigation (2)

Re: Server unreliability

From: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL www <pgsql-www(at)postgresql(dot)org>,PostgreSQL advocacy <pgsql-advocacy(at)postgresql(dot)org>
Subject: Re: Server unreliability
Date: 2004-09-29 17:52:40
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-advocacypgsql-www
On Wed, 29 Sep 2004, Bruce Momjian wrote:

> It is my opinion that we have to make major changes in the way we
> provide hosting for our servers.  There are several problems:
> o  Location of servers
> The location of our servers in Panama is a problem.  They are too far
> for any PostgreSQL maintainers to access.  Changing hardware or
> diagnosing problems has been too hard.  I have had like 2 days of
> downtime on my home machine in the past 12 years.  We have had more than
> 2 days of downtime in the past 6 months.  My wife would not accept such
> a reliability level.

This is currently being worked on ... we are looking at various remote 
management solutions so that we don't have to deal with waiting for a 
technician to get 'on the scene' ...

> o  FreeBSD
> The use of FreeBSD jails can cause servers to take +8 hours to fsck on a
> server crash or power failure.  Again, I would never accept such
> problems on my home server so it is hard to fathom how a project with
> thousands of users can accept that.  Either we need to find a fix, stop
> using jails, or get another operating system, but continuing to use a
> setup with a known problem is just asking for trouble.

Actually, again, this one is being addressed ... there is a solution in 
the pipeline to fix the cause of the 8+ hour fsck, but, since it is a fix 
to fsck itself, it hasn't been put into the mainstream code yet, due to 
*obvious* testing reasons ...

We've also added in hot failover as an option ... I've posted to -www 
asking about putting onto it, but so far, the only 
responses back have been along the lines of 'how are you doing it?' ...

The "risk" is that its not real time replication between the live and 
failover server ... on our high performance servers, the 'delay' is about 
5 minutes ...

... now, knowing that, if you feel comfortable with me putting this onto 
the mailing lists/cvsroot as well, knowing that there is a possibility of 
something being written 'in the gap' before failover, I'll do that VM also 

note that altho the replication has a gap, the heartbeat process runs 
every minute ... as soon as it can't ping anymore, it fails over ...

> o  Web site
> We have been talking about a new web page layout for years at this
> point.  I almost don't care if they just put a dancing bear up on the
> web site.  Let's do something!

What's wrong with the existing one?  Have you designed the dancing bear 
you'd like us to put up in place of what we have now?

> o  Archives
> The archives situation is a continual problem.  Again, maybe a dancing
> bear can help.  :-)

What is wrong with it now?  I'm cleaning up the code itself, but that is 
due to it being a mess right now, not due to any problems reported to me, 
removed one of the banner ads so that loading is a bit faster, and John 
has done, I think, a fantastic job on the search engine itself, including 
sending me changes for the archives themselves so that the 'time searches' 
should now work properly ...

So, do you have something specific you'd like to point out to us that 
we've overlooked and haven't fixed yet?

> Basically, with no money and no one offering servers

So far, I've had one person donate $10 ... in order to put a dedicated 
server onto the network, I'd need alot more of those ... that would pretty 
much eliminate your second point about the fsck's, since its only our 
*loaded* servers, that we have that problem with ... but, as I said, the 
fsck issue is being addressed as well ...

Marc G. Fournier           Hub.Org Networking Services (
Email: scrappy(at)hub(dot)org           Yahoo!: yscrappy              ICQ: 7615664

In response to


pgsql-www by date

Next:From: Marc G. FournierDate: 2004-09-29 17:57:05
Subject: Re: Server unreliability
Previous:From: Gavin M. RoyDate: 2004-09-29 17:48:34
Subject: Re: Server unreliability

pgsql-advocacy by date

Next:From: Marc G. FournierDate: 2004-09-29 17:57:05
Subject: Re: Server unreliability
Previous:From: Gavin M. RoyDate: 2004-09-29 17:48:34
Subject: Re: Server unreliability

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group