Quick Links

Re: idea for a geographically distributed database: how best to implement?

From:	Andy Ballingall <andy(at)areyoulocal(dot)co(dot)uk>
To:	"'Aidan Van Dyk'" <aidan(at)highrise(dot)ca>
Cc:	<pgsql-sql(at)postgresql(dot)org>
Subject:	Re: idea for a geographically distributed database: how best to implement?
Date:	2005-11-18 09:09:24
Message-ID:	ECOWS05M7gB4BwdbB7x00014f76@smtp-out5.blueyonder.co.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Thanks Aidan,

That's very interesting! I've not used slony yet, so I'll setup some tests
and try exactly this mechanism.

What I like about it is that the changes that have to occur to either add a
cell or to split an existing cell into two to manage growing traffic have a
localised effect on the web of database interdependencies, making it more
straightforward to do maintenance without the website being taken offline
for most people. They'd only notice downtime if they happened to live in or
next to the cells being messed around with. Not having to touch the actual
web application is nice.

The other advantage over my manual hack is that the replicated data is just
that - replicated, so when each database does a backup, it's only the master
data that is getting backed up and restored. In my manual scheme, following
restoration after a node failure, I'd have to do a postprocessing step to
work out which restored data was actually data slaved from a neighbouring
cell, and refresh it in case the neighbouring data has been updated since
the node went down.

(I hadn't considered backup and restore properly up till now, but I'm going
to add that to the list of good reasons for taking advantage of my
'distantly disjoint' data set. If my national data is shared between 500
cells, then each database is going to be titchy, and backup will take 'no
time at all', as in parallel, 500 streams of database data flow first to
node local storage and thence to safer, remote storage)

Cheers,
Andy

---------

-----Original Message-----
From: Aidan Van Dyk [mailto:aidan(at)highrise(dot)ca]
Sent: 17 November 2005 19:54
To: andy(at)areyoulocal(dot)co(dot)uk; aidan(at)highrise(dot)ca
Subject: Re: idea for a geographically distributed database: how best to
implement?

Andy,

This is *very* interesting scenario. Definitely peaked my interest on this
one.

I haven't tried this scenario before, but I have used "inheritted" tables to
do "splitups". Basically, just partition your data, and replicate certain
partitions between various neighbours.
http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

With Slony, you can get "master slave" replication of particular tables
"sets" (not necessarily whole databases).

So I haven't developped this idea, or tried it, but you might be able to
play with it and make it work.

On *each* database, you have a head table, with no records:
CREATE TABLE complete ();

And a bunch of "cell" tables that inherit the "complete" table:
CREATE TABLE cell_a () INHERITS "complete";
CREATE TABLE cell_b () INHERITS "complete";

Basically, you *always* work on the "complete" table, with the following
rules:

CREATE RULE cell_replicate_insert ON INSERT TO complete DO INSTEAD
INSERT INTO cell_a (...);
CREAT RULE cell_replicate_update ON UPDATE TO complete DO INSTEAD
UPDATE cell_a ...;

Use SlonyI to replecate cella to NODE B (master->slave), and on NODE B, do
something similar (except work on b, and replicate b -> NODE A).

This way, all your web/interface code everywhere is using the "complete"
table for any insert/update/select. When insert/updateing, it will always
go to the "local" table, which Slony "replicates" out to the neighbouring
nodes. And when selecting on it (the "complete" table), you get data from
your local table, and any "replicated" neighbouring node's tables which are
in the local schema inheriting from "complete" and SlonyI is updating.

Good luck, and if you get anywhere with this, let me know!

Andy Ballingall wrote:

> Hello,
>
> I've got a database for a website which is a variant of the 'show stuff
> near to me' sort of thing.
>
> Rather than host this database on a single server, I have a scheme in mind
> to break the database up geographically so that each one can run
> comfortably on a small server, but I'm not sure about the best way of
> implementing it.
>
> Here's the scheme:
>
> --------------------------------
> Imagine that the country is split into an array of square cells.
> Each cell contains a database that stores information about people who
> live in the area covered by the cell.
>
> There's one problem with this scheme. What happens if you live near the
> edge of a cell?
>
> My solution is that any inserted data which lies near to the edge of cell
> A is *also* inserted in the database of the relevant neighbouring cell -
> let's say cell B.
>
> Thus, if someone lives in cell B, but close to the border with cell A,
> they'll see the data that is geographically close to
> them, even if it lies in cell A.
>
> --------------------------------
>
> Is this a common pattern?
>
> I could, of course, simply find every insert, update and delete in the
> application and alter the code to explicitly update all the relevant
> databases, but is there a more elegant way of simply saying: "Do this
> transaction on both Database A and Database B" monotonically?
>
> I've had a look at some replication solutions, but they all seem to
> involve replicating an entire database. The advantage of my scheme is that
> if I can distribute my application over large numbers of small servers,
> I'll end up with more bangs for the buck, and it'll be much easier to
> manage growth by managing the number of servers, and number of cells
> hosted on each server.
>
> Thanks for any suggestions!
> Andy Ballingall
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.3/173 - Release Date: 16/11/2005

Responses

Re: idea for a geographically distributed database: how best to implement? at 2005-11-18 15:06:05 from Andrew Sullivan

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Andrew Sullivan	2005-11-18 15:06:05	Re: idea for a geographically distributed database: how best to implement?
Previous Message	A. Kretschmer	2005-11-18 06:15:48	Re: how to do a find and replace