Horizontal Write Scaling

From: Eliot Gable <egable+pgsql-hackers(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Horizontal Write Scaling
Date: 2010-11-23 20:43:23
Message-ID: AANLkTinxTiuuDy8Up-pZVh=1YDa89ga_cDarGZ3usg3L@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I know there has been a lot of talk about replication getting built into
Postgres and I know of many projects that aim to fill the role. However, I
have not seen much in the way of a serious attempt at multi-master write
scaling. I understand the fundamental problem with write scaling across
multiple nodes is Disk I/O and inter-node communication latency and that in
the conventional synchronous, multi-master replication type setup you would
be limited to the speed of the slowest node, plus the communication protocol
overhead and latency. However, it occurs to me that if you had a shared disk
system via either iSCSI, Fiber Channel, NFS, or whatever (which also had
higher I/O capabilities than a single server could utilize), if you used a
file system that supported locks on a particular section (extent) of a file,
it should theoretically be possible for multiple Postgres instances on
multiple systems sharing the database to read and write to the database
without causing corruption. Obviously, it would take some carefully designed
code in terms of file extent locking, but it seems like it would also
eliminate the need entirely for any type of replication system. Lustre seems
to support the type of file locking required. Of course, I am assuming the
disk system would be RAID 1, RAID 10, RAID 5, or RAID 6 for reliability
purposes and that it is sufficiently redundant that you don't have to worry
about an outage of your storage system.

Has anyone put any thought into what it would take to do this in Postgres?
Is it simply a matter of making the database file interaction code aware of
extent locking, or is it considerably more involved than that? It also
occurs to me that you probably need some form of transaction ordering
mechanism across the nodes based on synchronized timestamps, but it seems
Postgres-R has the required code to do that portion already written. That
may not even be needed since all nodes would actually be working on the same
database files and the locks would ensure a strict ordering of queries. The
only place I see that possibly causing a problem is someone load balancing
across the servers and sending a delete and insert really close to each
other to different nodes such that if the insert executes first, it would be
deleted by the delete even though the intent was to have the delete run
first. Timestamp ordering of the queries or just some shared transaction ID
across the nodes would eliminate that possibility. Other than that, is there
anything else I am missing? Wouldn't this type of setup be far simpler to
implement and provide better scalability than trying to do multi-master
replication using log shipping or binary object shipping or any other
techniques? Wouldn't it also be far more efficient since you don't need to
have a copy of your data on each master node and therefor also don't have to
ship your data to each node and have each node process it?

I am mostly asking for educational purposes, and I would appreciate
technical (and hopefully specific) explanations as to what in Postgres would
need to change to support this.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-11-23 20:52:18 pgsql: Remove useless whitespace at end of lines
Previous Message Peter Tanski 2010-11-23 19:54:11 Re: GiST seems to drop left-branch leaf tuples