Re: 9.3 to 9.5 upgrade problems

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: 9.3 to 9.5 upgrade problems
Date: 2016-07-03 16:42:18
Message-ID: f0699290-2c89-cb4e-0318-84603b629e3e@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/03/2016 11:04 AM, Adrian Klaver wrote:
> On 07/03/2016 08:49 AM, Andy Colson wrote:
>> On 07/03/2016 10:35 AM, Adrian Klaver wrote:
>>> On 07/03/2016 08:06 AM, Andy Colson wrote:
>>>> Hi all,
>>>>
>>>> I have a master (web1) and two slaves (web2, webserv), one slave is
>>>> quite far from the master, the db is 112 Gig, so pg_basebackup is my
>>>> last resort.
>>>>
>>>> I followed the page here:
>>>> https://www.postgresql.org/docs/9.5/static/pgupgrade.html
>>>>
>>>> including the rsync stuff. I practiced it _twice_, once in PG 9.5 beta,
>>>> and again a week ago, on two VM's I created locally. Both practice
>>>> sessions worked perfect.
>>>>
>>>> I just ran it on the live databases. The master seems ok, its running
>>>> PG 9.5 now, I can login to it, and no errors in the log.
>>>>
>>>> Neither slave works. After I'd gotten done with the pgupgrade steps,
>>>> both slaves gave me this error:
>>>>
>>>> FATAL: database system identifier differs between the primary and
>>>> standby
>>>>
>>>> Sure enough pg_controldata show'd their database system id different
>>>> (all three web1, web2, webserv were different. no matches at all), so
>>>> I'm assuming the rsync didnt rsync right, or I missed a step and ran it
>>>> to early, or something ... I'm not quite sure.
>>>>
>>>> I needed to get the live website back up and running again, so I let the
>>>> master go, ran analyze, and when it was finished, used the steps here to
>>>> try and resync:
>>>>
>>>> https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
>>>>
>>>> on Master:
>>>> select pg_start_backup('clone',true);
>>>> rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
>>>> web2:/pub/pg95/
>>>> select pg_stop_backup();
>>>> rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/
>>>
>>> Not sure about above rsync, that seems to undo what you did previously.
>>>
>>> Also was the remote directory empty when you did this?
>>>
>>
>> Not sure what you mean by undo. pgupgrade.html page, step 10, has you
>> rsync the master to the slave, so the pg95 directory is hard linked to
>> the pg93, which save's a ton to bandwidth when your servers are cross
>> county.
>
> I understand I am just trying to figure out what mixing methods (pg-upgrade, pg_start_backup) is doing?
>
> In particular the section on pg_start_backup:
>
> https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
>
> starts with:
>
> "Ensure that WAL archiving is enabled and working."
>
> and from I gather that is not the case.
>

The slave log says it reached a consistent state, and is accepting connections, which tells me it should run ok, maybe not with the newest data, but at least not get:

FATAL: cache lookup failed for database 16401

>
> Also was the remote directory empty when you did this?

Now that I think about this more, I think you're on to something. I'm trying to get an _exact_ copy of the master db onto the slave. Checking rsync man, it matches only on size and modified time, and I didn't include deletes.

I'm going to re-try with this:

select pg_start_backup('clone',true);
rsync -av --delete --checksum --exclude pg_xlog --exclude postgresql.conf /pub/pg95/* web2:/pub/pg95/
select pg_stop_backup();
rsync -av --delete --checksum /pub/pg95/pg_xlog web2:/pub/pg95/

That should make sure the copies are exact.

-Andy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-07-03 19:28:23 Re: deadlock between "WITH agg_tmp AS ({sel_stmt}), upd AS ({upd_stmt}) {ins_stmt}" and pure UPDATE statements
Previous Message James Keener 2016-07-03 16:33:17 Re: GRANTable Row Permissions