Re: Server unreliability

From: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL www <pgsql-www(at)postgresql(dot)org>, PostgreSQL advocacy <pgsql-advocacy(at)postgresql(dot)org>
Subject: Re: Server unreliability
Date: 2004-09-29 17:52:40
Message-ID: 20040929143454.A93533@ganymede.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-www

On Wed, 29 Sep 2004, Bruce Momjian wrote:

> It is my opinion that we have to make major changes in the way we
> provide hosting for our servers. There are several problems:
>
> o Location of servers
>
> The location of our servers in Panama is a problem. They are too far
> for any PostgreSQL maintainers to access. Changing hardware or
> diagnosing problems has been too hard. I have had like 2 days of
> downtime on my home machine in the past 12 years. We have had more than
> 2 days of downtime in the past 6 months. My wife would not accept such
> a reliability level.

This is currently being worked on ... we are looking at various remote
management solutions so that we don't have to deal with waiting for a
technician to get 'on the scene' ...

> o FreeBSD
>
> The use of FreeBSD jails can cause servers to take +8 hours to fsck on a
> server crash or power failure. Again, I would never accept such
> problems on my home server so it is hard to fathom how a project with
> thousands of users can accept that. Either we need to find a fix, stop
> using jails, or get another operating system, but continuing to use a
> setup with a known problem is just asking for trouble.

Actually, again, this one is being addressed ... there is a solution in
the pipeline to fix the cause of the 8+ hour fsck, but, since it is a fix
to fsck itself, it hasn't been put into the mainstream code yet, due to
*obvious* testing reasons ...

We've also added in hot failover as an option ... I've posted to -www
asking about putting www.postgresql.org onto it, but so far, the only
responses back have been along the lines of 'how are you doing it?' ...

The "risk" is that its not real time replication between the live and
failover server ... on our high performance servers, the 'delay' is about
5 minutes ...

... now, knowing that, if you feel comfortable with me putting this onto
the mailing lists/cvsroot as well, knowing that there is a possibility of
something being written 'in the gap' before failover, I'll do that VM also
...

note that altho the replication has a gap, the heartbeat process runs
every minute ... as soon as it can't ping anymore, it fails over ...

> o Web site
>
> We have been talking about a new web page layout for years at this
> point. I almost don't care if they just put a dancing bear up on the
> web site. Let's do something!

What's wrong with the existing one? Have you designed the dancing bear
you'd like us to put up in place of what we have now?

> o Archives
>
> The archives situation is a continual problem. Again, maybe a dancing
> bear can help. :-)

What is wrong with it now? I'm cleaning up the code itself, but that is
due to it being a mess right now, not due to any problems reported to me,
removed one of the banner ads so that loading is a bit faster, and John
has done, I think, a fantastic job on the search engine itself, including
sending me changes for the archives themselves so that the 'time searches'
should now work properly ...

So, do you have something specific you'd like to point out to us that
we've overlooked and haven't fixed yet?

> Basically, with no money and no one offering servers

So far, I've had one person donate $10 ... in order to put a dedicated
server onto the network, I'd need alot more of those ... that would pretty
much eliminate your second point about the fsck's, since its only our
*loaded* servers, that we have that problem with ... but, as I said, the
fsck issue is being addressed as well ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy(at)hub(dot)org Yahoo!: yscrappy ICQ: 7615664

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Marc G. Fournier 2004-09-29 17:57:05 Re: Server unreliability
Previous Message Gavin M. Roy 2004-09-29 17:48:34 Re: Server unreliability

Browse pgsql-www by date

  From Date Subject
Next Message Marc G. Fournier 2004-09-29 17:57:05 Re: Server unreliability
Previous Message Gavin M. Roy 2004-09-29 17:48:34 Re: Server unreliability