Quick Links

Efficient recursion

From:	C Storm <christian(dot)storm(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Efficient recursion
Date:	2007-05-18 22:41:24
Message-ID:	1179528084.715044.215690@e65g2000hsc.googlegroups.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

This may be the wrong list to post to but I thought I'd post here
first since it is a performance related problem.

Essentially, I'm looking for the most efficient way to break a
database into two 'shards' based on a top level table's
primary key. For example, split a sales database into two using a
territory.

The database rows are constrained to have unique ownership, meaning
each row of data can be traced to one and only one shard, e.g., sales
territory. Multiple inheritance would make this a harder thing to do.

In essence, I'd like to do a cascading delete using one of these
territory's ids with the caveat that the data isn't just deleted but
deleted and COPYed out to disk. This COPYed data could then be loaded
into a fresh schema to bring up the second shard. Seemingly the data
would be in the right insert order, for referential integrity
purposes, as a result of this operation since it would be doing a
breadth first search for the data.

I can envision a couple different ways to do this:
a) Gather the relational tree up to but not including the leaves and
use it to parse out the shard from a db dump. Then do a cascading
delete to remove the data from the database.

b) Recursively COPY (query) to a file (breadth first COPY) while
crawling down the relational tree.

The complications I see are having to make sure the referential tree
is a DAG (directed acyclic graph) or unroll it to become one.

I know Live Journal, Skype, etc. have to do this sort of thing when
they need to scale and didn't want to reinvent the wheel or, more
importantly, step on the same land mines that others have stepped on.

Thanks for any and all feedback.

Christian

Responses

Re: Efficient recursion at 2007-05-19 18:12:35 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tyrrill, Ed	2007-05-18 23:16:20	Re: Slow queries on big table
Previous Message	Steinar H. Gunderson	2007-05-18 22:35:29	Re: CPU Intensive query