Re: Re: postmaster.pid still exists after pacemaker stopped postgresql - how to remove

From: Mistina Michal <Michal(dot)Mistina(at)virte(dot)sk>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Re: postmaster.pid still exists after pacemaker stopped postgresql - how to remove
Date: 2013-08-26 14:02:30
Message-ID: e4e43612d938407a851fbd4656502d8c@Electra.virte.intra
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Masao.
Thank you for suggestion. In deed that could occure. Most probably while I
was testing split-brain situation. In that case I turned off network card on
one node and on both nodes DRBD was in primary role. But after the
split-brain occurred I resync DRBD so from two primaries I promoted one as
"primary" (winner) and second one as "secondary" (victim). Data should be
consistent by that moment. But probably it wasn't consistent.

I am using DRBD only in one technical center. Data are syncing by streaming
replication to the secondary technical center where is another DRBD
instance.

It's like this:

TC1:
--- node1: DRBD (primary), pgsql
--- node2: DRBD (secondary), pgsql

TC2:
--- node1: DRBD (primary), pgsql
--- node2: DRBD (secondary), pgsql

Within one technical center only one pgsql runs only on one node. This is
done by pacemaker/corosync.
From the outside perspective it looks like only one postgresql server is
running in one TC.
TC1 (master) ==== streaming replication =====> TC2 (slave)

If one node in technical center fails, the fail-over to secondary node is
really quick. It's because fast network within technical center.
Between TC1 and TC2 there is a WAN link. If something goes wrong and TC1
became unavailable I can switch manually / automatically to TC2.

Is there more appropriate solution? Would you use something else?

Best regards,
Michal Mistina

On Mon, Aug 26, 2013 at 9:53 PM, Mistina Michal <Michal(dot)Mistina(at)virte(dot)sk>
wrote:
> Hi there.
>
> I didn't find out why this issue happened. Only backup and format of
> the filesystem where corrupted postmaster.pid file existed helped to
> get rid of it. Hopefully the file won't appear in the future.

I have encountered similar problem when I broke the filesystem by a double
mount. You may have gotten the same problem.

> Master/Slave Set: ms_drbd_pg [drbd_pg]
>
> Masters: [ tstcaps01 ]
>
> Slaves: [ tstcaps02 ]

Why do you use DRBD with streaming replicatin? If you locates the database
cluster on DRBD, it's better to check the status of DRBD filesystem.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Torello Querci 2013-08-26 14:27:45 Problem creating index
Previous Message David Johnston 2013-08-26 13:59:27 Re: how to use aggregate functions in this case