Re: BDR Selective Replication

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: swaxolez <willem(at)pcfish(dot)ca>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: BDR Selective Replication
Date: 2015-04-29 06:38:55
Message-ID: CAMsr+YEcyTL2nE=1uncnL9DzF1JMopgARW6bzDbtarp-prT-aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 29 April 2015 at 09:14, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> On 4/27/15 7:54 PM, Craig Ringer wrote:
>
>> If 'default replication set' is the idea of "here's what tables
>> *should* be getting replicated regardless of whether that's
>> happening or not", it'd be great if that was done so it could be
>> split out on it's own at some point. It's a problem that affects all
>> replication systems.
>>
>>
>> It wasn't, but that's an interesting idea.
>>
>> You need away to identify peer nodes in an abstract way before you can
>> really define sets of which nodes should get which tables. So I think
>> replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are
>> a pre-requisite for that though, and one that's proving difficult to get
>> in.
>>
>
> Perhaps... different replication systems probably use different methods to
> identify, so presumably there'd need to be some way to map a generic
> identifier into an appropriate identifier for whatever replication system
> you're using.

Replication identifiers do just that: provide a way to map identifiers from
some external system into a local unique identifier for a peer node, along
with tracking of the replay position from the peer so replay can be
restarted at a consistent point. The replay position is an LSN, so they're
not going to work for any arbitrary system, though.

How would you want to go about storing and tracking the information? A
>> new catalog? The other issue for in-core replication sets would probably
>> be making it foreign-key aware, so replication of a table transitively
>> requires replication of its references.
>>
>
> As you said, we'd need a way to identify replication nodes. We might also
> need/want a way to specify topology.

Topology? Why?

All a node needs to know is "send data from <these tables> to <these
peers>". It's just a set. If a replication system is doing something fancy
it'd be able to manage the replication sets on the nodes.

> I don't think topology would be too hard (presumably it's either a single
> 'parent' node, or a list of peers). What might be more interesting is
> dealing with different systems methods of identifying nodes.
>

Yeah, topology is hard. Rings, mesh with dangling follower nodes, etc.

I don't think it's really the same thing as replication sets.

You'd want a way to define different sets and associate them with nodes. A
> node could be a provider, subscriber, or both. I think some replication
> systems support 'pass through' as well, where the node passes data
> downstream but doesn't apply it itself. Or it could be multi-master and
> possibly a provider to read-only subscribers.
>

Yeah, you're talking about some kind of abstract modelling of a replication
topology. I'm not sure that's at all necessary to keep track of which
tables should be replicated to which nodes.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Job 2015-04-29 09:24:13 Pg_bulkload and speed
Previous Message Mitu Verma 2015-04-29 05:32:28 clearing of the transactions shown in pg_locks