Re: Horizontal Write Scaling

From: Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Eliot Gable <egable+pgsql-hackers(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Horizontal Write Scaling
Date: 2010-11-25 09:45:45
Message-ID: AANLkTikf-S2H0OP_P+OOqFFGO=_yopuSG_PkeqS1_0i6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

2010/11/25 Markus Wanner <markus(at)bluegap(dot)ch>:
> Eliot,
>
> On 11/23/2010 09:43 PM, Eliot Gable wrote:
>> I know there has been a lot of talk about replication getting built into
>> Postgres and I know of many projects that aim to fill the role. However,
>> I have not seen much in the way of a serious attempt at multi-master
>> write scaling.
>
> Postgres-XC and Postgres-R are two pretty serious projects, IMO.

Yes. Please visit http://postgres-xc.sourceforge.net/ for details.

>> I understand the fundamental problem with write scaling
>> across multiple nodes is Disk I/O and inter-node communication latency
>> and that in the conventional synchronous, multi-master replication type
>> setup you would be limited to the speed of the slowest node,
>
> That's not necessarily true for Postgres-R, which is why I call it an
> 'eager' solution (as opposed to fully synchronous). While it guarantees
> that all transactions that got committed *will* be committable on all
> nodes at some time in the future, nodes may still lag behind others.
>
> Thus, even a slower / busy node doesn't hold back the others, but may
> serve stale data. Ideally, your load balancer accounts for that and
> gives that node a break or at least reduces the amount of transactions
> going to that node, so it can catch up again.
>
> Anyway, that's pretty Postgres-R specific.

Right. In the case of Postgres-XC, tables can be partitioned (we
call "distributed") among cluster nodes so that writing can be done in
parallel.

>
>> plus the
>> communication protocol overhead and latency. However, it occurs to me
>> that if you had a shared disk system via either iSCSI, Fiber Channel,
>> NFS, or whatever (which also had higher I/O capabilities than a single
>> server could utilize), if you used a file system that supported locks on
>> a particular section (extent) of a file, it should theoretically be
>> possible for multiple Postgres instances on multiple systems sharing the
>> database to read and write to the database without causing corruption.
>
> Possible, yes. Worthwile to do, probably not.

We may be suffered from synchronizing cache on each database.

>
>> Has anyone put any thought into what it would take to do this in
>> Postgres? Is it simply a matter of making the database file interaction
>> code aware of extent locking, or is it considerably more involved than
>> that? It also occurs to me that you probably need some form of
>> transaction ordering mechanism across the nodes based on synchronized
>> timestamps, but it seems Postgres-R has the required code to do that
>> portion already written.
>
> If you rely on such an ordering, why use additional locks. That seems
> like a waste of resources compared to Postgres-R. Not to mention the
> introduction of a SPOF with the SAN.
>
>> Wouldn't this type of setup be far
>> simpler to implement
>
> That's certainly debatable, yes. I obviously think that the benefit per
> cost ratio for Postgres-R is better :-)
>
>> and provide better scalability than trying to do
>> multi-master replication using log shipping or binary object shipping or
>> any other techniques?

Postgres-XC uses combination of replicated table and distributed
(partitioned) table, not just simple replication.

>
> It's more similar to replication using two phase commit, which provably
> doesn't scale (see for example [1]) And using a SAN for locking
> certainly doesn't beat 2PC via an equally modern/expensive interconnect.
>
>> Wouldn't it also be far more efficient since you
>> don't need to have a copy of your data on each master node and therefor
>> also don't have to ship your data to each node and have each node
>> process it?
>
> You have to ship it from the SAN to the node, so I definitely don't
> think so, but see this as an argument against it. Each having a local
> copy and only exchange locking information and transactional changes
> sounds like much less traffic overall.
>
> Regards
>
> Markus Wanner
>
>
> [1]: The Dangers of Replication and a Solution, Gray et al, In Proc. of
> the SIGMOD Conf., 1996,
> http://research.microsoft.com/apps/pubs/default.aspx?id=68247
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Cheers;
---
Koichi Suzuki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maciek Sakrejda 2010-11-25 09:56:02 Re: [JDBC] JDBC and Binary protocol error, for some statements
Previous Message Radosław Smogura 2010-11-25 09:43:29 Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea