This may be the wrong list to post to but I thought I'd post here
first since it is a performance related problem.
Essentially, I'm looking for the most efficient way to break a
database into two 'shards' based on a top level table's
primary key. For example, split a sales database into two using a
The database rows are constrained to have unique ownership, meaning
each row of data can be traced to one and only one shard, e.g., sales
territory. Multiple inheritance would make this a harder thing to do.
In essence, I'd like to do a cascading delete using one of these
territory's ids with the caveat that the data isn't just deleted but
deleted and COPYed out to disk. This COPYed data could then be loaded
into a fresh schema to bring up the second shard. Seemingly the data
would be in the right insert order, for referential integrity
purposes, as a result of this operation since it would be doing a
breadth first search for the data.
I can envision a couple different ways to do this:
a) Gather the relational tree up to but not including the leaves and
use it to parse out the shard from a db dump. Then do a cascading
delete to remove the data from the database.
b) Recursively COPY (query) to a file (breadth first COPY) while
crawling down the relational tree.
The complications I see are having to make sure the referential tree
is a DAG (directed acyclic graph) or unroll it to become one.
I know Live Journal, Skype, etc. have to do this sort of thing when
they need to scale and didn't want to reinvent the wheel or, more
importantly, step on the same land mines that others have stepped on.
Thanks for any and all feedback.
pgsql-performance by date
|Next:||From: Tyrrill, Ed||Date: 2007-05-18 23:16:20|
|Subject: Re: Slow queries on big table |
|Previous:||From: Steinar H. Gunderson||Date: 2007-05-18 22:35:29|
|Subject: Re: CPU Intensive query|