Re: Database cluster?

From: "Gordan Bobic" <gordan(at)freeuk(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Database cluster?
Date: 2000-12-01 08:58:58
Message-ID: 005301c05b75$d928b9c0$8000000a@localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> I actually analyzed it once. I came to the conclusion that to do it
right
> it would be easier to make an almost entirely new db but use the same
> external interfaces as PostgreSQL.

I admit that I am not really too up-to-date on database theory, but I am a
bit surprised at that...

> To do a kludge of it, one might just implement a tier that sits between
the
> user and a bunch of standard PostgreSQL backends.

That is precisely what I was thinking about. There would have to be a
"master" node that controls what goes where, and distributed the load. This
"shouldn't" be too difficult (although I am not totally sure what I mean by
that). The nasty bit would probably be hacking the optimizer, SQL command
"CLUSTER", and VACUUM to take account and efficiently use all the extra
room for improving the performance.

Automating a "near-optimal" distribution of tables across machines could be
a bit of a difficult problem from a theory side, but it ought to be
possible. There are several options here.

One could just put one table on each server, which is unlikely to be all
that beneficial, although in a multi-table join, you'd want to search the
smallest tables first.

Then, there's the option of just splitting each table across multiple
machines. There is also the possibility of having some records overlap
between machines, of the on-line optimizer decides that that would be
useful for performance, and then sort out the syncing somehow.

Or, one could set up an even more sophisticated system where only the
tables and data that would benefit from being together would be on the same
server, so there could be a section of two tables on one server, the rest
of those two tables and a section of another table on another server, etc.
Basically, make both the table and record allocations completely dynamic
between the servers.

I am not sure how useful each of these splits would be, but it is certainly
something well worth exploring theoretically before the actual
implementation, because I reserve the right to be wrong in thinking that
any of these methods would produce an actual improvement in performance.

And, of course, there would be the bit of getting the optimizer and partial
replication to work properly across servers, which may not be an easy task.

> It'd make a neat companion project, though. Like PG/Enterprise or
> PG/Warehouse or something.

I agree. It would be really neat. Something like Mosix, but for databases.
And it just sounds like something that would be really useful for large
databases, especially as we start reaching steep part of the
price/performance curve for database servers.

Regards.

Gordan

> At 04:02 PM 11/30/00 -0000, Gordan Bobic wrote:
> >> You're almost describing a Teradata DBM.
> >
> >I knew someone must have thought of it before. ;-)
> >
> >[snip]
> >
> >> The thing that impacted me the most about this architecture was that
> >> sorting was practically built in. So all the intermediary computers
had
> >to
> >> do was merge the sorted result sets from its lower level computers.
> >Blazing!
> >
> >They effectively implemented a binary tree in hardware. One hell of an
> >indexing mechanism. :-)
> >
> >> I miss that old beast. But I certainly cannot afford the multimillion
> >> dollars required to get one for myself.
> >
> >I suppose it would depend on how many computers you want to have in this
> >cluster. The main reason why clusters are getting popular recently
(albeit
> >not yet for databases, or so it would seem) is because it is cheaper
than
> >anything else with similar performance.
> >
> >The main question remains - are there any plans to implement something
> >similar to this with PostgreSQL? I would volunteer to help with some
> >coding, if a "group" was formed to work on this "clustering" module.
> >
> >Regards.
> >
> >Gordan

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Gordan Bobic 2000-12-01 09:05:33 Re: Database cluster?
Previous Message Alex Pilosov 2000-12-01 06:56:43 Re: Closest SQL dialect to PostgreSQL for ERwin?