Re: Horizontal Write Scaling

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Eliot Gable <egable+pgsql-hackers(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Horizontal Write Scaling
Date: 2010-11-25 09:03:53
Message-ID: 4CEE2679.1000803@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Eliot,

On 11/23/2010 09:43 PM, Eliot Gable wrote:
> I know there has been a lot of talk about replication getting built into
> Postgres and I know of many projects that aim to fill the role. However,
> I have not seen much in the way of a serious attempt at multi-master
> write scaling.

Postgres-XC and Postgres-R are two pretty serious projects, IMO.

> I understand the fundamental problem with write scaling
> across multiple nodes is Disk I/O and inter-node communication latency
> and that in the conventional synchronous, multi-master replication type
> setup you would be limited to the speed of the slowest node,

That's not necessarily true for Postgres-R, which is why I call it an
'eager' solution (as opposed to fully synchronous). While it guarantees
that all transactions that got committed *will* be committable on all
nodes at some time in the future, nodes may still lag behind others.

Thus, even a slower / busy node doesn't hold back the others, but may
serve stale data. Ideally, your load balancer accounts for that and
gives that node a break or at least reduces the amount of transactions
going to that node, so it can catch up again.

Anyway, that's pretty Postgres-R specific.

> plus the
> communication protocol overhead and latency. However, it occurs to me
> that if you had a shared disk system via either iSCSI, Fiber Channel,
> NFS, or whatever (which also had higher I/O capabilities than a single
> server could utilize), if you used a file system that supported locks on
> a particular section (extent) of a file, it should theoretically be
> possible for multiple Postgres instances on multiple systems sharing the
> database to read and write to the database without causing corruption.

Possible, yes. Worthwile to do, probably not.

> Has anyone put any thought into what it would take to do this in
> Postgres? Is it simply a matter of making the database file interaction
> code aware of extent locking, or is it considerably more involved than
> that? It also occurs to me that you probably need some form of
> transaction ordering mechanism across the nodes based on synchronized
> timestamps, but it seems Postgres-R has the required code to do that
> portion already written.

If you rely on such an ordering, why use additional locks. That seems
like a waste of resources compared to Postgres-R. Not to mention the
introduction of a SPOF with the SAN.

> Wouldn't this type of setup be far
> simpler to implement

That's certainly debatable, yes. I obviously think that the benefit per
cost ratio for Postgres-R is better :-)

> and provide better scalability than trying to do
> multi-master replication using log shipping or binary object shipping or
> any other techniques?

It's more similar to replication using two phase commit, which provably
doesn't scale (see for example [1]) And using a SAN for locking
certainly doesn't beat 2PC via an equally modern/expensive interconnect.

> Wouldn't it also be far more efficient since you
> don't need to have a copy of your data on each master node and therefor
> also don't have to ship your data to each node and have each node
> process it?

You have to ship it from the SAN to the node, so I definitely don't
think so, but see this as an argument against it. Each having a local
copy and only exchange locking information and transactional changes
sounds like much less traffic overall.

Regards

Markus Wanner

[1]: The Dangers of Replication and a Solution, Gray et al, In Proc. of
the SIGMOD Conf., 1996,
http://research.microsoft.com/apps/pubs/default.aspx?id=68247

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Radosław Smogura 2010-11-25 09:43:29 Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea
Previous Message Shigeru HANADA 2010-11-25 08:51:11 Re: SQL/MED - file_fdw