I'm beginning to work on advanced additions to in-core replication for
There are a number of additional features for existing single-master
replication still to achieve, but the key topics to be addressed are
major leaps forward in functionality. I hope to add useful features in
9.3, though realise that many things could take two or even more
release cycles to achieve. (The last set of features took 8 years, so
I'm hoping to do this a little faster).
Some of my 2ndQuadrant colleagues will be committing themselves to the
project also and we hope to work with the community in the normal way
to create new features. I mention this only to say that major skills
and resources will be devoted to this for the next release(s), not
that this is a private project.
Some people have talked about the need for "multi-master replication",
whereby 2+ databases communicate changes to one another. This topic
has been discussed in some depth in Computer Science academic papers,
most notably, "The Dangers of Replication and a Solution" by the late
Jim Gray. I've further studied this to the point where I have a
mathematical model of this that allows me to predict what our likely
success will be from implementing that. Without meaning to worry you,
MM replication alone is not a solution for large data or the general
case. For the general case, single master replication will continue to
be the most viable option. For large and distributed data sets, some
form of partitioning/sharding is required simply because full
multi-master replication just isn't viable at both volume and scale.
So my take on this is that MM is desirable, but is not the only thing
we need - we also need partial/filtered replication to make large
systems practical. Hence why I've been calling this the
"Bi-Directional Replication" project. I'm aware that paragraph alone
requires lots of explanation, which I hope to do both in writing and
in person at the forthcoming developer conference.
My starting point for designs is to focus on a key aspect: massive
change to the code base is not viable and any in-core solution must
look at minimally invasive changes. And of course, if it is in-core we
must also add robust, clear code with reasonable performance that do
not impede non-replication usage.
The use cases we will address are not focused on any one project or
user. I've distilled these points so far from talking to a wide
variety of users, from major enterprises to startups.
1. GEOGRAPHICALLY DISTRIBUTED - Large users require both High
Availability, which necessitates multiple nodes, as well as Disaster
Recovery, which necessitates geographically distributed nodes. So my
focus is not focused on "clustering" in the sense of Hadoop or Oracle
RAC, since those technologies require additional technologies to
provide DR, so my aim is to arrive at a coherent set of technologies
that provide all that we want. I'm aware that other projects *are*
focused on clustering, so even more reason not to try to
simultaneously invent the wheel.
2. COHERENT - With regard to the coherence, I note this thinking is
similar to the way that Oracle replication is evolving, where they
have multiple kinds of in-core replication and various purchased
technologies. We have a similar issue with regard to various external
projects. I very much hope that we can utilise the knowledge, code and
expertise of those other projects in the way we move forwards.
3. ONLINE UPGRADE - highly available distributed systems must have a
mechanism for online upgrade, otherwise they won't stay HA for long.
This challenge must be part of the solution, and incidentally should
be a useful goal in itself.
4. MULTI-MASTER - the ability to update data from a variety of locations
5. WRITE-SCALEABLE - the ability to partition data across nodes in a
way that allows the solution to improve beyond the write rate of a
Those are the basic requirements that I am trying to address. There
are a great many important details, but the core of this is probably
what I would call "logical replication", that is shipping changes to
other nodes in a way that does not tie us to the same physical
representation that recovery/streaming replication does now. Of
course, non-physical replication can take many forms.
The assumption of consistency across nodes is considered optional at
this point, and I hope to support both eagerly consistent and
eventually consistent approaches.
I'm aware that this is a broad topic and many people will want input
on this, and am also aware that will take much time. This post is more
about announcing the project, than discussing specific details.
My strategy for doing this is to come up with some designs and
prototypes of a few things that might be the best way forwards. By
building prototypes we will more quickly be able to address the key
questions before us. So there is currently work on research-based
development to allow wider discussion based upon something more than
just whiteboards. I'll be the first to explain things that don't work.
I also very much agree that "one size fits all" is the wrong strategy.
So there will be implementation options and parameters, and possibly
even multiple replication techniques.
I will also be organising a small-medium sized "Future of In-Core
Replication" meeting in Ottawa on Wed 16 May, 6-10pm. To avoid this
becoming an unworkably large meeting, this will be limited but is open
to highly technical PostgreSQL users who share these requirements, any
attendee of the main developer's meeting that wishes to attend and
other developers working on PostgreSQL replication/related topics.
That will also allow me to order enough pizza for everyone too. I'll
send out private invites to people whom I know (no spam) and I think
may be interested, but you are welcome to email me to get access.
(This will take me a day or two, so don't ping me back you didn't get
I'm going to do my best to include the right set of features for the
majority of people, all focused on submissions to PostgreSQL core, not
any external project.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date
|Next:||From: Simon Riggs||Date: 2012-04-26 12:42:30|
|Subject: Re: Temporary tables under hot standby|
|Previous:||From: Asif Naeem||Date: 2012-04-26 12:38:14|
|Subject: Re: plpython crash (PG 92)|