Re: BUG #12808: BDR lock adding Postgis extension in one node

From: "Ruth Melendo" <rmelendo(at)teltronic(dot)es>
To: "'Andres Freund'" <andres(at)2ndquadrant(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12808: BDR lock adding Postgis extension in one node
Date: 2015-03-02 09:29:15
Message-ID: 000c01d054cb$59d61140$0d8233c0$@teltronic.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks for your explanation Andres. It worked for me.

Just 3 more questions:

1.- Should I get a newer version of BDR? Does mine have important bugs? I´m testing that for production env in a near future and the application in production in my company depends on these results.

2.- I don’t understand exactly what I´m doing with this sentence: COPY (SELECT * FROM pg_logical_slot_get_binary_changes('bdr_17911_6120567807158814813_1_16385__', NULL, 1, 'interactive', 'true')) TO '/dev/null';
I suppose I´m discarding the changes. That will make my changes to be discarded or are you counting with bdr.initial_copy activated to get them again?

3.- When I do this:
SET bdr.skip_ddl_replication = on;
Is it the same as commenting the bdr properties in postgres.conf?

As I´m testing for production env, I need to write a procedure to recover database when getting an error like this one. I have gotten similar errors more times when doing tests, in transactions DDL or DML, I guess my fault in any configuration property. When I get to that situation, should I follow the same instructions?

Instructions to recover database when gets corrupted after bdr error:

1- Skip replication of the problematic transaction.
SELECT slot_name, datoid, database, active FROM pg_replication_slots WHERE slot_name LIKE 'bdr_'||(SELECT oid FROM pg_database WHERE datname = current_database())||'_%';

COPY (SELECT * FROM pg_logical_slot_get_binary_changes('bdr_17911_6120567807158814813_1_16385__', NULL, 1, 'interactive', 'true')) TO '/dev/null';

2.- Reconnect to the database and execute

SET bdr.skip_ddl_replication = on;
DROP EXTENSION postgis;

Should it be something like that or should I delete the data directory in the problematic node and create the cluster again so that the bdr.init_copy bring the database to a right state?

Thanks again!

Ruth Patricia Melendo Ventura
Software Engineer
TELTRONIC, S.A.U.
T: +34 976 465656 Ext. 179
F: +34 976 465722
www.teltronic.es

-----Mensaje original-----
De: 'Andres Freund' [mailto:andres(at)2ndquadrant(dot)com]
Enviado el: sábado, 28 de febrero de 2015 10:36
Para: Ruth Melendo
CC: pgsql-bugs(at)postgresql(dot)org
Asunto: Re: [BUGS] BUG #12808: BDR lock adding Postgis extension in one node

Hi,

[some information has been acquired privately]

On 2015-02-27 15:30:20 +0100, Ruth Melendo wrote:
> The error got when adding a PostGIS extension from PGAdmin in node 1.
> Extension was created and the depending table too but in node 2 got
> the error when adding the extension because the depending table was
> not created.

> But, although the problem was originated for this, my main dude is how
> to fix the database then it´s locked. What do I have to do? Because
> now, I cannot do any DDL or DML against it.

The reason it's locked is that it's waiting for DDL to be replicated. And that fails due to the spatial_ref_sys error.

In the version you're using the easiest way to resolve this is probably to skip replication of the problematic transaction, and then fix up things afterwards by hand.

To do that connect to the database you performed the CREATE EXTENSION on. There get a list of the outgoing connections with a query like:

SELECT slot_name, datoid, database, active FROM pg_replication_slots WHERE slot_name LIKE 'bdr_'||(SELECT oid FROM pg_database WHERE datname = current_database())||'_%'; ┌─────────────────────────────────────────┬────────┬──────────┬────────┐
│ slot_name │ datoid │ database │ active │
├─────────────────────────────────────────┼────────┼──────────┼────────┤
│ bdr_17911_6120567807158814813_1_16385__ │ 17911 │ node_03 │ t │
│ bdr_17911_6120567807158814813_1_17153__ │ 17911 │ node_03 │ f │
└─────────────────────────────────────────┴────────┴──────────┴────────┘
(2 rows)

In my three node setup those two consume changes from the node_03 database.

To consume the problematic transaction, use something like:

COPY (SELECT * FROM pg_logical_slot_get_binary_changes('bdr_17911_6120567807158814813_1_16385__', NULL, 1, 'interactive', 'true')) TO '/dev/null';

for each of the remote nodes. After that you should be able to execute DDL again.

Then please reconnect and do
SET bdr.skip_ddl_replication = on;
DROP EXTENSION postgis;

should bring the nodes back in sync with regard to the postgis extension.That

We'll work on making it possible to replicate extensions that insert data into their own tables during creation.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Grittner 2015-03-02 15:23:36 Re: BUG #12810: database conection drop
Previous Message Michael Paquier 2015-03-02 04:44:03 Re: pg_upgrade failure on Windows Server