Re: Database cluster?

From: Doug Semig <dougslist(at)semig(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Database cluster?
Date: 2000-11-30 15:37:37
Message-ID: 3.0.6.32.20001130103737.007d8ca0@sloth.c3net.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

You're almost describing a Teradata DBM.

What an amazing machine! Last I heard about 6 years ago, though, AT&T was
rewriting it as an NT app instead of running on proprietary hardware. The
proprietary hardware was essentially a cluster of 80486 computers (at the
time).

What they had done was implemented a pyramid structure of 80486 computers.
The lowest level of computers had hard disks and stored the data. Two of
the lowest level computers would "report" to a single higher up computer.
Two of these higher up computers would "report" to yet another single
higher up computer until there was only one higher up computer to report to.

The thing that impacted me the most about this architecture was that
sorting was practically built in. So all the intermediary computers had to
do was merge the sorted result sets from its lower level computers. Blazing!

And data was stored on a couple of leaf-level computers for redundancy.

I miss that old beast. But I certainly cannot afford the multimillion
dollars required to get one for myself. We lovingly called the one we
worked with the "Pteradactyl," which is the old name for that bird-like
dinosaur (evidentally there's a new word for the bird-like dinosaur, the
pteronodon or something?).

Doug

At 02:44 PM 11/30/00 -0000, Gordan Bobic wrote:
>Thanks.
>
>I have just had another thought. If all the tables are split across several
>computers, this would help as well.
>
>For example, if we have 100 records and 2 database servers, each server
>could have 50 of those 100 records on it. When a selection is required,
>each server would look through it's much smaller database, and report back
>the "hits". This would, effectively, provide a near linear speedup in the
>query time, while introducing only the minor network overhead (or a major
>one, depending on how much data is transferred).
>
>Some extra logic could then be implemented for related tables that would
>allow the most closely related records from the different tables to be
>"clustered" (as in kind of remotely similar to the CLUSTER command) on the
>same server, for faster response time and minimized network usage
>requirements. The "vacuum" or "cluster" features could be used overnight to
>re-optimize the distribution of records across the servers.
>
>In all this, a "master" node could be used for coordinating the whole
>operation. We could ask the master node to do a query, and it would
>automatically, knowing what slaves it has, fire off that query on them.
>Each slave would then in parallel, execute a query, and return a subset of
>the data we were looking for. This data would then be joined into one
>recordset before it is returned to the client that requested it.
>
>As far I can see, as long as the amounts of data shifted aren't huge enough
>to cause problems with network congestion, and the query time is dominant
>to data transfer time over the network, this should provide a rather
>scaleable system. I understand that the form of database clustering I am
>mentioning here is fairly rudimentary and unsophisticated, but it would
>certaily be a very useful feature.
>
>Are there any plans to implement this sort of functionality in PostgreSQL?
>Or is this a lot more complicated than it seems...
>
>Regards.
>
>Gordan

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2000-11-30 15:39:21 Re: Built in Functions use with recordsets
Previous Message Tom Lane 2000-11-30 15:32:10 Re: plpgsql variable trouble