Re: Removing and readding bdr nodes

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Mathew Moon <mathew(dot)moon(at)vipaar(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Removing and readding bdr nodes
Date: 2015-05-04 01:39:23
Message-ID: CAMsr+YEcv7c4s2NMSfjacZR760X3-35nZa7EjJkuAvL3Dwuqkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1 May 2015 at 12:40, Mathew Moon <mathew(dot)moon(at)vipaar(dot)com> wrote:

> Hi,
>
> I have a group of 5 bdr nodes and before we can move to production with
> them we must demonstrate that we can remove a node from the group and add
> that node back later. When I remove a node it stays in the bdr.bdr_nodes
> view with status 'k'. If I try to add that node back the node itself errors
> saying that it is already part of a bdr group.
>

That's intended, though the error message needs to be improved.

You can't remove a node then add it back later. When you remove a node,
the remaining nodes are still generating change streams, but they aren't
saving them up for the removed node anymore. So if you remove a node, make
some changes, and add the node back then the node will have a "gap" in its
history, putting it out of sync with all its peers. Changes on the re-added
node could replicate old data to new nodes, changes from new nodes might
not apply on the old re-added node, etc. Worse, if any table structures
have changed then the node can't possibly apply changes or send changes
that can be applied by other nodes.

Once a node is removed you must add a new node to replace it, you can't
re-add a removed node.

There's room for improvement here, but the fundamental limitations mean
we're never going to support simply removing and re-adding nodes. We may be
able to provide a way to clean and resync a node later, but it'll be much
the same thing as dropdb; createdb; and rejoin.

Note that short of removing a node, you can (a) just shut it down for a
while or (b) pause replay on that node using bdr.bdr_apply_pause() and
bdr.bdr_apply_resume(). While a node is down, other nodes will function
mostly normally, but will be unable to purge WAL required for replaying to
the down/paused node so they'll eventually run out of space in pg_xlog.
They will also be unable to perform DDL, because that requires consensus.

If I totally remove the entire database from the node, deleting all of the
> data directory, and run initdb on the data directory again and try to add
> the node to the group I get errors saying that the other nodes expect this
> one to use its old sysid and connect to its old replication slot.
>

That doesn't make sense. Odd. Can you please show the step-by-step process
you used to get that effect, with exact commands run,
exact text of error messages, etc?

> I don't understand how the other nodes are identifying this one by its old
> information since I removed the entire data directory and started over.
>

Nor do I. When you remove the datadir you remove the only place the sysid
for that node is stored. Are you certain you ran the join query on the
newly created not-yet-joined node?

> I saw in another thread that support for removing nodes is not complete
> but surely there must be some way to do this even manually.
>

Node remove by SQL function calls is supported in 0.9.0. The remaining work
centers mainly around making it more robust under load and handling
unexpected node loss better.

> How would one go about removing ALL traces of an existing node from all of
> the others so it was like it never existed before?
>

Once it's confirmed removed, delete the bdr.bdr_nodes entry with status =
'k'. All replication slots (pg_catalog.pg_replication_slots) should already
be gone.

There should never be any reason to do this though. If you need to do it,
then something is already wrong. A database oid shouldn't get reused, so if
you dropdb and createdb you get a new node identity. The same is true if
you re-initdb. Since re-adding a removed node won't work, there's no reason
to ever remove the record of the node's existence and removal.

> Any help would be greatly appreciated. BDR is the perfect solution for our
> infrastructure's needs for backup and availability
>

You might want to consider BDR's single-master UDR mode too, or tools like
Londiste. Don't add multi-master unless you really need it. Significant
limitations are introduced around how and when you can do DDL, etc, when
doing multi-master BDR, per the manual.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jim Nasby 2015-05-04 02:10:02 Re: [HACKERS] optimization join on random value
Previous Message Edson F. Lidorio 2015-05-04 00:57:00 Standby problem after restore_command Implementation