Re: [BDR] Best practice to automatically abort a DDL operation when one node is down

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Sylvain MARECHAL <marechal(dot)sylvain2(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [BDR] Best practice to automatically abort a DDL operation when one node is down
Date: 2016-01-18 00:52:43
Message-ID: CAMsr+YFGK4YS5gJE=9X9YU4L7AZ7YF1MXr9sd768uRVEEo5Ziw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 13 January 2016 at 21:45, Sylvain MARECHAL <marechal(dot)sylvain2(at)gmail(dot)com>
wrote:

> The problem is that the (1) DDL request will wait indefinitely, meaning
> all transactions will continue to fail until the DDL operation is manually
> aborted (for example, doing CTRL C in psql to abort the "CREATE TABLE").
>

Correct, and by design.

I'd like to do a pre-check where we sync up with the peer nodes and see if
they're all alive before we take the DDL lock. This would reduce the impact
a bit and allow an early ERROR like "ERROR: cannot perform DDL when one or
more nodes is unreachable".

However... we have something pretty close already. You can just set a
statement_timeout in the session doing the DDL. It'll cancel the operation
if it takes too long.

Note that a lock_timeout will NOT work because the BDR global DDL lock is
not recognised as a true lock by PostgreSQL.

> What is the best practice to make sure the DDL operation will fail,
> possibly after a timeout, if one of the node is down?

statement_timeout

> I could check the state of the node before issuing the DDL operation, but
> this solution is far from being perfect as the node may fail right after
> this.
>

Correct, but it's still useful to do.

I'd check to see all nodes are connected in pg_stat_replication then I'd
issue the DDL with a statement_timeout set.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2016-01-18 00:54:01 Re: Postgres BDR bdr_init_copy fails
Previous Message Craig Ringer 2016-01-18 00:46:23 Re: Adding node to bdr group