Quick Links

Re: Clustering, parallelised operating system, super-computing

From:	Brian Modra <brian(at)zwartberg(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Clustering, parallelised operating system, super-computing
Date:	2010-05-14 06:51:08
Message-ID:	AANLkTimV_Wh1DKofePDbUZErIb4li1qY7F9Qdi3LgxA_@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 14/05/2010, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Brian Modra wrote:
>> Hi,
>> I've been told that PostgreSQL and other similar databases don't work
>> well on a parallelised operating system because they make good use of
>> shared memory which does not cross the boundary between nodes in a
>> cluster.
>>
>> So I am wondering if any work is being done to make it possible to
>> have a single database schema that spans a number of hosts?
>>
>> For example, a table on one host/node that has a reference to a table
>> on another host/node with deletes cascading back.
>> e.g.
>
> Not currently. There are some prototypes in development, but those
> usually have the same database on all the machines and they share the
> load.

I'm trying to solve the problem of firstly distributing the volume of
data, and secondarily the load.

So far, I'm putting some bulky data onto different hosts, where there
is no need to ever do a join. I put a "reference" table onto a host
with the data that needs to be joined, then I can select the actual
data from the other host by unique IDs after the join has been
performed locally.

To create a reference with "on delete cascade" across hosts, I create
a trigger (after) delete, and in the plpgsql I call dblink to do the
remote delete.

Similarly, I can do joins in plpgsql with the help of dblink.
But, doing joins across hosts certainly does defeat the purpose of
"distributing the load".

I think that the schema design must be done carefully when distributing data.
So it really will be difficult to get this "supercomputer database" right.

Maybe the best way to solve this is not to do automatic distribution
of the data, but rather to provide tools for implementing distributed
references and joins.

I'm thinking of working on this as part of "The Karoo Project" Open
Source Project I'm working on, and would appreciate
comments/support/criticism.
Thanks

--
Brian Modra Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

In response to

Re: Clustering, parallelised operating system, super-computing at 2010-05-14 00:21:30 from Bruce Momjian

Responses

Re: Clustering, parallelised operating system, super-computing at 2010-08-18 23:39:50 from Benjamin Smith

Browse pgsql-general by date

	From	Date	Subject
Next Message	Leonardo F	2010-05-14 07:08:46	Re: Authentication method for web app
Previous Message	Catalin BOIE	2010-05-14 06:32:01	PANIC: corrupted item pointer: 32766