Re: Yet another infrastructure problem

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Cc: Greg Sabino Mullane <greg(at)turnstep(dot)com>, "pgsql-www(at)postgresql(dot)org" <pgsql-www(at)postgresql(dot)org>
Subject: Re: Yet another infrastructure problem
Date: 2008-10-26 09:30:02
Message-ID: 8555D6DC-5EA9-4F35-9AEA-97B09CF929CD@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www


On 26 okt 2008, at 02.03, Russell Smith <mr-russ(at)pws(dot)com(dot)au> wrote:

> Magnus Hagander wrote:
>> Greg Sabino Mullane wrote:
>>
>>> People have been complaining on IRC that nothing can be
>>> downloaded from our site, as the mirror-picking script throws
>>> an internal error.
>>>
>>> When are we going to fix our infrastructure properly?
>>>
>>
>> As Stefan has already posted on this very list, he is performing
>> maintenance on that machine in order to move it to new hardware.
>>
>> //Magnus
>>
>>
> We are still missing the one important thing "Notification" lots and
> lots of people use the website that will never go near the lists,
> irc or
> anything else. Notifying the email lists of downtime will stop the
> heavily involved community from complaining, but it does absolutely
> nothing for general user trying to download something from the
> internet.

That is a very good point. And it actually goes to many other parts of
the project, and not just the infrastructure. Basically the
authoritative version of *all* important information is the lists.

>
> You can argue about replication, downtime and the like until you are
> blue in the face. There will always be some downtime. The question
> is
> how do people know about it, when is it and what do they do about it?

Agreed.

> Until reading this thread I had never even thought about how
> PostgreSQL
> does or doesn't notify people about downtime or potential downtime.
> Reading down thread this notification issue appears to have been
> ignored. To me it seems like relatively low hanging fruit to allow
> messages to be posted on the website about planned outages, and
> notifications of recent unplanned

So how do you deal with a case like the one discussed here, where the
web is what didn't work? The static fromtends were up, but not the
master which is used to update them...

> outages. Complaining on IRC is one of
> the only ways to find out what'so going on at the moment for a casual
> user.

The casual user would be using the lists, certainly not irc. Peope who
aren't deep in the project certainly will hit the lists first, because
that's what we say on our website.

Now what they really do is email webmaster, which a lot of peope did.

That said, I agree a better way would be good to have.

> When Marc's hosting had trouble a couple of years back, the only
> way to find out anything was on irc.

That outlines one of the major problems. It must not be too hard to
deal with for the guy trying to fix the actual problem. Sending an
email is *easy*, and stefan did so in this case. But as you also note,
even this is too much for some people.

We could publish a snapshot of our nagios data, but I doubt that would
actually be helpful to these peope.

> I'd look into this, but I'd need a lot more knowledge about how the
> web
> stuff is setup, and I'm probably not going to be able to glean that
> from
> people in a couple of weeks. But if I can. Great!.
>

Hey, give it a shot. Just remember that the technical part is the easy
part. Creating a process and getting buyin for that is going to be
the hard part.

/Magnus

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Dave Page 2008-10-26 09:59:00 Re: Yet another infrastructure problem
Previous Message Russell Smith 2008-10-26 01:03:43 Re: Yet another infrastructure problem