Re: Horizontal scalability/sharding

From: Mason S <masonlists(at)gmail(dot)com>
To: obartunov(at)gmail(dot)com
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-08-31 18:48:31
Message-ID: CA+rR5x2cq0Voro_C0bNma8a3iTdK+=Rf6OVoEjo89dnT0FS0aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> We also a bit disappointed by Huawei position about CSN patch, we hoped
> to use for our XTM.
>

Disappointed in what way? Moving to some sort of CSN approach seems to open
things up for different future ideas. In the short term, it would mean
replacing potentially large snapshots and longer visibility checks. In the
long term, perhaps CSN could help simplify the design of multi-master
replication schemes.

> FDW approach has been actively criticized by pg_shard people and that's
> also made me a bit suspicious. It looks like we are doomed to continue
> several development forks, so we decided to work on very important common
> project, XTM, which we hoped could be accepted by all parties and
> eventually committed to 9.6. Now I see we were right, unfortunately.
>
>
I think the original XC project probably would have taken the FDW approach
as a basis if it had existed, with focus on push-down optimizations.

I assume that future work around PG sharding probably would be more likely
to be accepted with the FDW approach. One could perhaps work on pushing
down joins, aggregates and order by, then look at any optimizations gained
if code is moved outside of FDW. It would make sense if some kind of
generic optimization for foreign tables for SQL-based sources could be
leveraged across all databases, rather than having to re-implement for each
FDW.

There are different approaches and related features that may need to be
improved.

Do we want multiple copies of shards, like the pg_shard approach? Or keep
things simpler and leave it up to the DBA to add standbys?

Do we want to leverage table inheritance? If so, we may want to spend time
improving performance for when the number of shards becomes large with what
currently exists. If using table inheritance, we could add the ability to
specify what node (er, foreign server) the subtable lives on. We could
create top level sharding expressions that allow these to be implicitly
created.

Should we allow arbitrary expressions for shards, not just range, list and
hash?

Maybe the most community-acceptable approach would look something like

- Use FDWs, and continue to optimize push-down operations, also for
non-PostgreSQL databases.

- Use table inheritance for defining the shards. Ideally allow for
specifying that some shards may be replicated to other foreign servers (and
itself) (for pushing down joins with lookup/static tables; at this point it
should be decent for star schema based data warehouses).

- XTM/GTM hooks. Preferably we move to CSN for snapshots in core PostgreSQL
though.

Longer term, efficient internode joins would require a lot more work.

The devil is in the details. There are things that have to be addressed,
for example, if using global XIDs via GTM, not every transaction is on
every node, so we need to make sure that new clog pages get added
properly. There is also the potential to require a lot more code to be
added, like for cursor handling and stored functions. Perhaps some
limitations when using shards to foreign servers are acceptable if it is
desired to minimize code changes. XC and XL code help.

Regards,

Mason

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-08-31 18:52:05 Anybody have icc for IA64?
Previous Message Andrew Dunstan 2015-08-31 18:45:20 Re: Is "WIN32" #defined in Cygwin builds?