Re: Horizontal scalability/sharding

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-02 19:30:14
Message-ID: CA+TgmoZXExYWvTq9001GKbL6OY47fj+XpBZh_oGfRrPxsvUkhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 2, 2015 at 3:03 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> 4. Therefore, I think that we should instead use logical replication,
>> which might be either synchronous or asynchronous. When you modify
>> one copy of the data, that change will then be replicated to all other
>> nodes. If you are OK with eventual consistency, this replication can
>> be asynchronous, and nodes that are off-line will catch up when they
>> are on-line. If you are not OK with that, then you must replicate
>> synchronously to every node before transaction commit; or at least you
>> must replicate synchronously to every node that is currently on-line.
>> This presents some challenges: logical decoding currently can't
>> replicate transactions that are still in process - replication starts
>> when the transaction commits. Also, we don't have any way for
>> synchronous replication to wait for multiple nodes.
>
> Well, there is a WIP patch for that, which IMHO would be much improved
> by having a concrete use-case like this one. What nobody is working on
> -- and we've vetoed in the past -- is a way of automatically failing and
> removing from replication any node which repeatedly fails to sync, which
> would be a requirement for this model.

Yep. It's clear to me we need that in general, not just for sharding.
To me, the key is to make sure there's a way for the cluster-ware to
know about the state transitions. Currently, when the synchronous
standby changes, PostgreSQL doesn't tell anyone. That's a problem.

> You'd also need a way to let the connection nodes know when a replica
> has fallen behind so that they can be taken out of
> load-balancing/sharding for read queries. For the synchronous model,
> that would be "fallen behind at all"; for asynchronous it would be
> "fallen more than ### behind".

How is that different from the previous thing? Just that we'd treat
"lagging" as "down" beyond some threshold? That doesn't seem like a
mandatory feature.

>> But in theory
>> those seem like limitations that can be lifted. Also, the GTM needs
>> to be aware that this stuff is happening, or it will DTWT. That too
>> seems like a problem that can be solved.
>
> Yeah? I'd assume that a GTM would be antithetical to two-stage copying.

I don't think so. If transaction A writes data on X which is
replicated to Y and then commits, a new snapshot which shows A as
committed can't be used on Y until A's changes have been replicated
there. That could be enforced by having the commit of A wait for
replication, or by having an attempt by a later transaction to use the
snapshot on Y wait until replication completes, or some even more
sophisticated strategy that considers whether the replication backlog
touches the same data that the new transaction will read. It's
complicated, but it doesn't seem intractable.

> I'm not a big fan of a GTM at all, frankly; it makes clusters much
> harder to set up, and becomes a SPOF.

I partially agree. I think it's very important that the GTM is an
optional feature of whatever we end up with, rather than an
indispensable component. People who don't want it shouldn't have to
pay the price in performance and administrative complexity. But at
the same time, I think a lot of people will want it, because without
it, the fact that sharding is in use is much less transparent to the
application.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message dinesh kumar 2015-09-02 19:49:14 Re: [PATCH] SQL function to report log message
Previous Message Peter Geoghegan 2015-09-02 19:24:33 Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore