Re: Horizontal scalability/sharding

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Mason S <masonlists(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-01 04:00:41
Message-ID: CABOikdO=faPUP67-cTmDXdVwPFp-vvtiXBOPL=M1haaDrbySSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 1, 2015 at 3:17 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
>
> It seems to me that sharding consists of (1) breaking your data set up
> into shards, (2) possibly replicating some of those shards onto
> multiple machines, and then (3) being able to access the remote data
> from local queries. As far as (1) is concerned, we need declarative
> partitioning, which is being worked on by Amit Langote. As far as (2)
> is concerned, I hope and expect BDR, or technology derived therefrom,
> to eventually fill that need. As far as (3) is concerned, why
> wouldn't we use the foreign data wrapper interface, and specifically
> postgres_fdw? That interface was designed for the explicit purpose of
> allowing access to remote data sources, and a lot of work has been put
> into it, so it would be highly surprising if we decided to throw that
> away and develop something completely new from the ground up.
>
> It's true that postgres_fdw doesn't do everything we need yet. The
> new join pushdown hooks aren't used by postgres_fdw yet, and the API
> itself has some bugs with EvalPlanQual handling. Aggregate pushdown
> is waiting on upper planner path-ification. DML pushdown doesn't
> exist yet, and the hooks that would enable pushdown of ORDER BY
> clauses to the remote side aren't being used by postgres_fdw. But all
> of these things have been worked on. Patches for many of them have
> already been posted. They have suffered from a certain amount of
> neglect by senior hackers, and perhaps also from a shortage of time on
> the part of the authors. But an awful lot of the work that is needed
> here has already been done, if only we could get it committed.
> Aggregate pushdown is a notable exception, but abandoning the foreign
> data wrapper approach in favor of something else won't fix that.
>
> Postgres-XC developed a purpose-built system for talking to other
> nodes instead of using the FDW interface, for the very good reason
> that the FDW interface did not yet exist at the time that Postgres-XC
> was created. But several people associated with the XC project have
> said, including one on this thread, that if it had existed, they
> probably would have used it. And it's hard to see why you wouldn't:
> with XC's approach, the remote data source is presumed to be
> PostgreSQL (or Postgres-XC/XL/X2/whatever); and you can only use the
> facility as part of a sharding solution. The FDW interface can talk
> to anything, and it can be used for stuff other than sharding, like
> making one remote table appear local because you just happen to want
> that for some reason. This makes the XC approach look rather brittle
> by comparison. I don't blame the XC folks for taking the shortest
> path between two points, but FDWs are better, and we ought to try to
> leverage that.
>
>
In my discussions on this topic with various folks including Robert, I've
conceded that if FDW was available when XC was first written, in all
likelihood we would have used and extended that interface. But that wasn't
the case and we did what we thought was the best solution at that time,
given the resources and the schedule. To be honest, when XC project was
started, I was quite skeptical about the whole thing given the goal was to
built something which can replace Oracle RAC with may be less than 1%
resources of what Oracle must have invested in building RAC. The lack of
resources at the start of the project keeps showing up in the quality
issues that users report from time to time. Having said that, I am quite
satisfied with what we have been able to build within the constraints.

But FDW is just one part of the story. There is this entire global
consistency problem that would require something like GTM to give out XIDs
and snapshots, atomicity which would require managing transactions across
multiple shards, join pushdowns when all data is not available locally,
something that XL is attempting to solve with datanode-datanode exchange of
information, other global states such as sequences, replicating some part
of the data to multiple shards for efficient operations, ability to
add/remove shards with least disruption, globally consistent
backups/restore. XC/XL has attempted to solve each of them to some extent.
I don't claim that they are completely solved and there are no corner cases
left, but we have made fairly good progress on each of them.

My worry is that if we start implementing them again from scratch, it will
take a few years before we get them in a usable state. What XC/XL lacked is
probably a Robert Haas or a Tom Lane who could look at the work and suggest
major edits. If that had happened, the quality of the product could have
been much better today. I don't mean to derate the developers who worked on
XC/XL, but there is no harm in accepting that if someone with a much better
understanding of the whole system was part of the team, that would have
positively impacted the project. Is that an angle worth exploring? Does it
make sense to commit some more resources to say XC or XL and try to improve
the quality of the product even further? To be honest, XL is in far far
better shape (haven't really tried XC in a while) and some more
QA/polishing can make it production ready much sooner.

Yet another possibility is rework the design such that only coordinator
needs to be a fork of PostgreSQL but the shards are all PostgreSQL
instances, queried using standard client APIs. That would reduce the code
that needs to go in the core to build the entire scalable system and also
shorten the timeline considerably.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-09-01 04:13:07 Unicode mapping scripts cleanup
Previous Message Peter Eisentraut 2015-09-01 03:57:25 perlcritic