Re: Thoughts on the mirroring system etc

From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>, <pgsql-www(at)postgresql(dot)org>
Subject: Re: Thoughts on the mirroring system etc
Date: 2005-01-20 12:47:25
Message-ID: E7F85A1B5FF8D44C8A1AF6885BC9A0E452856F@ratbert.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

> -----Original Message-----
> From: pgsql-www-owner(at)postgresql(dot)org
> [mailto:pgsql-www-owner(at)postgresql(dot)org] On Behalf Of Magnus Hagander
> Sent: 20 January 2005 12:12
> To: pgsql-www(at)postgresql(dot)org
> Subject: [pgsql-www] Thoughts on the mirroring system etc
>
> And if I'm stepping on someones toes here, let me apologize
> in advance.

Only really mine - and I knew you were writing this :-)

>
>
> Number of mirrors
> -----------------
> * There are currently almost 60 mirrors for the static web content.
>
> * During the very largest load during slashdotting etc, the three
> servers serving up the static content totalled no more than a little
> over 6Mbit of traffic, at around less than 500 requests / second.
>
> * During this time, wwwmaster pushed around 1.5Mbit
>
> * As long as www.postgresql.org is fast, people will *not* pick their
> local mirror for the web (ftp is a different thing, as it's more
> bandwidth intensive).
>
>
> This leads me to the conclusion that we do *not* in fact need
> the large
> mirror network to handle the bandwidth load. In fact, most of those
> sites probably use up more bandwidth syncing than they save.

Yes. I posted comments to this effect before Christmas. During the fun
yesterday morning, I also noticed that rsync connections were taking
significant amounts of CPU - in fact, 4 concurrent ones were taking
around 40% CPU between then on svr4 for at least a few minutes. Disk IO
was almost certainly equally high. I cannot believe that the bandwidth
saved by 60 odd mirrors justifies the CPU, network and disk IO required
to rsync.

As an example, I run www.uk.postgresql.org. On the 10th Jan, a date
picked pretty much at random, I logged 2448 http requests. Each hit on
the homepage results in about 30(!) httpd requests, so represents as few
as 82 hits!

Yesterday, release day, I only logged 2387 hits!!

> My suggestion for this is to limit the number of mirrors to around 5,
> give or take a few. But instead, put higher demands on these mirrors
> than we do now. Demand they sync every 30 minutes (or 60, but you get
> the point). Demand that they have a fast machine and a fast network
> connection. There have been enough offers of servers and networks that
> this should not be a major problem. Demand that they respond to
> www.postgresql.org - if it can have a dedicated IP, even better.
> Distributed across the world of course.

Yes - we are already planning to do this, and indeed some of the work
has been done. The mirror tracker checks whether or not a mirror will
respond to www.postgresql.org requests, and the backend database has a
flag to mark the 'primary' mirrors.

>
> Then do some "DNS magic" to do the load balancing:

<snip DNS Magic>

Yes, the current mirror tracker could easily be adapted to do this.

>
> A similar solution for wwwmaster, of course.
>

The major problem with wwwmaster is that we need multimaster replication
to handle it properly, without having a single point of failure. Slony 1
will not resolve that basic issue.

> wwwmaster
> ---------
> If you hit the ftp browser (or a download link), and then
> click anything
> in the menu, you get the whole site served from wwwmaster. If
> the above
> is fixed, so mirrors are all referred to as www.postgresql.org, it
> should be as simple as sticking a <base href> in there or
> something. BUt
> until then, perhaps some creative coding in the framework can
> fix it so
> links that are hit on wwwmaster point back to www whereas the static
> site uses relative links only?

Yes, I need to think about this. At the moment, the flags on the mirror
pages have been hardcoded back to www, but a better solution is needed.

> Wow. That was a lot longer than initially intended. Hope
> someone has the
> patience to read it all ;-)

I did :-)

/D

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Joshua D. Drake 2005-01-20 14:59:48 Infoworld
Previous Message Magnus Hagander 2005-01-20 12:11:37 Thoughts on the mirroring system etc