Re: Horizontal Write Scaling

From: Eliot Gable <egable+pgsql-hackers(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Horizontal Write Scaling
Date: 2010-11-23 20:55:55
Message-ID: AANLkTikfX67NKrzPXxHzriSMXgQDV37+-xoGoUH4tT0v@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 23, 2010 at 3:43 PM, Eliot Gable
<egable+pgsql-hackers(at)gmail(dot)com<egable%2Bpgsql-hackers(at)gmail(dot)com>
> wrote:
<snip>

> Other than that, is there anything else I am missing? Wouldn't this type of
> setup be far simpler to implement and provide better scalability than trying
> to do multi-master replication using log shipping or binary object shipping
> or any other techniques? Wouldn't it also be far more efficient since you
> don't need to have a copy of your data on each master node and therefor also
> don't have to ship your data to each node and have each node process it?
>
> I am mostly asking for educational purposes, and I would appreciate
> technical (and hopefully specific) explanations as to what in Postgres would
> need to change to support this.
>
>
Now that I think about this more, it seems you would still need to ship the
transactions to your other nodes and have some form of processing system on
each that knew which node was supposed to be executing each transaction and
whether that node is currently online. It would also have to have designated
backup nodes to execute the transaction on. Otherwise, you could end up
waiting forever for a transaction to finish that was sent to one node right
before that node lost power. However, if a transaction manager on each node
is able to figured out the ordering of the transactions for itself based on
some globally incrementing transaction ID and able to figure out which node
will be executing the transaction and which node is the backup if the first
one fails, etc., then if the backup sees the primary for that transaction go
offline, it could execute the transaction instead.

Then, I suppose you also need some system in Postgres which can allow
concurrent processing of transactions such that they don't process stuff in
a transaction which is dependent on a transaction that has not yet been
committed, but can process other stuff. So, evaluation of deterministic
functions could take place, but anything volatile could not until all
previous transactions finished. I assume Postgres already has something like
this in order to scale across multiple cores in a single box. This setup
would basically make all the master nodes for the database look like just
extra memory and CPU cores.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-11-23 20:59:44 Re: Horizontal Write Scaling
Previous Message Peter Eisentraut 2010-11-23 20:53:35 Re: ALTER TYPE recursion to typed tables