Quick Links

Re: Horizontal scalability/sharding

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Horizontal scalability/sharding
Date:	2015-09-02 10:19:29
Message-ID:	CAFjFpRfz8TE3xZ9KRDu7v3d59h_tG+zZEX0Zd_gcAe_Eie6kow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 2, 2015 at 12:49 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> On 09/01/2015 11:36 AM, Tomas Vondra wrote:
> >> We want multiple copies of shards created by the sharding system itself.
> >> Having a separate, and completely orthagonal, redundancy system to the
> >> sharding system is overly burdensome on the DBA and makes low-data-loss
> >> HA impossible.
> >
> > IMHO it'd be quite unfortunate if the design would make it impossible to
> > combine those two features (e.g. creating standbys for shards and
> > failing over to them).
> >
> > It's true that solving HA at the sharding level (by keeping multiple
> > copies of a each shard) may be simpler than combining sharding and
> > standbys, but I don't see why it makes low-data-loss HA impossible.
>
> Other way around, that is, having replication standbys as the only
> method of redundancy requires either high data loss or high latency for
> all writes.
>
> In the case of async rep, every time we fail over a node, the entire
> cluser would need to roll back to the last common known-good replay
> point, hence high data loss.
>
> In the case of sync rep, we are required to wait for at least double
> network lag time in order to do a single write ... making
> write-scalability quite difficult.
>
> Futher, if using replication the sharding system would have no way to
> (a) find out immediately if a copy was bad and (b) fail over quickly to
> a copy of the shard if the first requested copy was not responding.
> With async replication, we also can't use multiple copies of the same
> shard as a way to balance read workloads.
>
> If we write to multiple copies as a part of the sharding feature, then
> that can be parallelized, so that we are waiting only as long as the
> slowest write (or in failure cases, as long as the shard timeout).
> Further, we can check for shard-copy health and update shard
> availability data with each user request, so that the ability to see
> stale/bad data is minimized.
>

XC (and I guess XL, pgPool II as well) did this by firing same DML
statement to all the copies after resolving any volatile references (e.g.
now()) in DML, so that all the copies get the same values. That method
however needed some row identifier which can identify same row on all the
replicas. Primary key is used as row identifier usually, but not all use
cases which require shards to be replicated have primary key in their
sharded tables.

>
> There are obvious problems with multiplexing writes, which you can
> figure out if you knock pg_shard around a bit. But I really think that
> solving those problems is the only way to go.
>
> Mind you, I see a strong place for binary replication and BDR for
> multi-region redundancy; you really don't want that to be part of the
> sharding system if you're aiming for write scalability.
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Re: Horizontal scalability/sharding at 2015-09-01 19:19:40 from Josh Berkus

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2015-09-02 10:25:21	Re: Horizontal scalability/sharding
Previous Message	Nikolay Shaplov	2015-09-02 09:58:47	Re: pageinspect patch, for showing tuple data