Quick Links

Re: Automatic restore corruption problem

From:	Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To:	Matthieu Lejeune <matthieu(dot)lejeune(at)exxoss(dot)com>
Cc:	Keith <keith(at)keithf4(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject:	Re: Automatic restore corruption problem
Date:	2015-07-16 18:11:50
Message-ID:	CAECtzeVB+28FXvNW=51wiqQVrNaNEu1iZ-LOnGyafJvnWxXrzw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

2015-07-14 12:32 GMT+02:00 Matthieu Lejeune <matthieu(dot)lejeune(at)exxoss(dot)com>:

> Hi,
>
> The master server : 10.10.11.1 is p2prddnmdbm
> I got a real time slave with the recovery process : p2prddnmdbs
>
> I have a third copy with a H-24 data : p2prddnmdbc
> On this server I need to have the copy of the master at 23H00 every day.
> So I'm using the script send before.
> During the day the buisness modify the database on server p2prddnmdbc and
> that's why I use a rsync --delete every night to get a clean copy of the
> prod server.
>
> So what's the best way to do the restore ?
>
>
There were two issues on your script: it removed the backup_label file and
didn't create the recovery.conf file. Did you fix this? if yes, can you
send back your script? Thanks.

Thanks
>
> Matthieu
>
>
> Le 14/07/15 08:50, Guillaume Lelarge a écrit :
>
> 2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu(dot)lejeune(at)exxoss(dot)com>:
>
>> Hi,
>>
>> I had no recovery.conf on this server because I launch my replication
>> every night I need a H-24 copy database.
>>
>>
> On your first email, you said: "I have a script for restoring a database
> every night to an other postgresql database". A restore is not replication,
> even it's a PITR restore.
>
> Still on your first email, you execute pg_start_backup() and
> pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose
> p2prddnmdbm and 10.10.11.1 are the same server?
>
> This is my recovery.conf
>> root(at)p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
>> standby_mode = 'off'
>> primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator
>> password=XXXXXX'
>> trigger_file = '/var/lib/postgresql/9.1/main/trigger'
>> restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
>> root(at)p2prddnmdbc:/var/lib/postgresql#
>>
>>
> You don't need primary_conninfo if you only want to restore your
> database. Though that isn't your issue right now.
>
>
>> But with or without a recovery.conf file I can't start the database
>> service :
>> root(at)p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
>> [....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL
>> server failed to start. Please check the log output: 2015-07-14 08:02:59
>> CEST LOG: database system was interrupted; last known up at 2015-07-13
>> 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record
>> 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record
>> 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try
>> removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14
>> 08:02:59 CEST LOG: startup process (PID 24617) exited with exit
>> co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup
>> process failure ... failed!
>> failed!
>>
>>
> In your previous email, you had the startup process waiting for a file.
> These logs can't be the good ones.
>
> What you should probably do is tell us exactly what you want to do, and
> state what you do right now, and what logs you get, and what processes are
> on the server. That would help us to help you.
>
> Thanks
>> Matthieu
>>
>> Le 12/07/15 13:46, Guillaume Lelarge a écrit :
>>
>> Hi,
>>
>> 2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <
>> <matthieu(dot)lejeune(at)exxoss(dot)com>matthieu(dot)lejeune(at)exxoss(dot)com>:
>>
>>> Hi thank for your reply
>>>
>>> My target is to give a database for buisness testing query and they are
>>> modify the database during the buisness day.
>>>
>>> Now I got this error if I keep the file backup_label :
>>>
>>> root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql
>>> start
>>> [....] Starting PostgreSQL 9.3 database server: main[....] The
>>> PostgreSQL server failed to start. Please check the log output: 2015-07-12
>>> 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10
>>> CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12
>>> 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12
>>> 10:12:45 CEST HINT: If you are not restoring from a backup, try removing
>>> the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45
>>> CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12
>>> 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ...
>>> failed!
>>> failed!
>>>
>>>
>>> If I put the recovery.conf the database is waiting for the wal to
>>> relaunch the replication.
>>>
>>> postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00
>>> /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
>>> postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_
>>> postgres: startup process waiting for 0000000100000178000000B9
>>> root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
>>> postgres(at)p2prddnmdbc:~$ psql energycomm
>>> psql: FATAL: the database system is starting up
>>>
>>> Have you got an idea to stop the replication process and start the
>>> database ?
>>>
>>>
>> What did you put in the recovery.conf file? (hint: standby_mode must be
>> off)
>>
>>
>>
>>> Kind regards
>>> Matthieu
>>>
>>> Le 10/07/15 16:46, Keith a écrit :
>>>
>>> A recent, relevant post
>>>
>>>
>>> http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html
>>>
>>> On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <
>>> <guillaume(at)lelarge(dot)info>guillaume(at)lelarge(dot)info> wrote:
>>>
>>>> Hi,
>>>>
>>>> Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <
>>>> <matthieu(dot)lejeune(at)exxoss(dot)com>matthieu(dot)lejeune(at)exxoss(dot)com> a écrit :
>>>> >
>>>> > Hi all,
>>>> >
>>>> > I have a script for restoring a database every night to an other
>>>> postgresql database
>>>> >
>>>> > root(at)p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
>>>> > #/bin/bash
>>>> > /etc/init.d/postgresql stop
>>>> > mv /var/log/postgresql/postgresql-9.3-main.log
>>>> /var/log/postgresql/postgresql-9.3-main.log.old
>>>> > cd /var/lib/postgresql/9.3/main
>>>> > psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT
>>>> pg_start_backup('sync');"
>>>> > rsync -av --delete <root(at)10(dot)10(dot)11(dot)1:/var/lib/postgresql/9.3/main/*>
>>>> root(at)10(dot)10(dot)11(dot)1:/var/lib/postgresql/9.3/main/*
>>>> /var/lib/postgresql/9.3/main/
>>>> > rm backup_label
>>>> > chown -R postgres:postgres *
>>>> > psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT
>>>> pg_stop_backup();"
>>>> > /etc/init.d/postgresql start
>>>> > chmod 777 /var/log/postgresql/postgresql-9.3-main.log
>>>> > psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
>>>> > psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
>>>> > root(at)p2prddnmdbc:~#
>>>> >
>>>> >
>>>> > But during the day when the user are using the new database we got
>>>> error like this :
>>>> >
>>>> > 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file
>>>> "base/16386/14064061.1": read only 0 of 8192 bytes
>>>> > 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file
>>>> "base/16386/14064061.1": read only 0 of 8192 bytes
>>>> >
>>>> > I have check the : filesystem on the vm, on the HW SAN,...
>>>> >
>>>> > Any idea to fix this problem?
>>>>
>>>> Sure. Don't remove the backup_label file, and add the recovery.conf
>>>> file.
>>>>
>>>> --
>>>> Guillaume
>>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Guillaume.
>> http://blog.guillaume.lelarge.info
>> http://www.dalibo.com
>>
>>
>>
>
>
> --
> Guillaume.
> http://blog.guillaume.lelarge.info
> http://www.dalibo.com
>
>
>

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

In response to

Re: Automatic restore corruption problem at 2015-07-14 10:32:53 from Matthieu Lejeune

Responses

Re: Automatic restore corruption problem at 2015-07-16 18:39:34 from Kevin Grittner

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Tom Lane	2015-07-16 18:14:52	Re: segfault when creating index on huge table
Previous Message	Natalie Wenz	2015-07-16 18:07:02	postgres_fdw user mapping and role inheritance