Re: Automatic restore corruption problem

From: Matthieu Lejeune <matthieu(dot)lejeune(at)exxoss(dot)com>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: Keith <keith(at)keithf4(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Automatic restore corruption problem
Date: 2015-07-14 10:32:53
Message-ID: 55A4E555.7090805@exxoss.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

The master server : 10.10.11.1 is p2prddnmdbm
I got a real time slave with the recovery process : p2prddnmdbs

I have a third copy with a H-24 data : p2prddnmdbc
On this server I need to have the copy of the master at 23H00 every day.
So I'm using the script send before.
During the day the buisness modify the database on server p2prddnmdbc
and that's why I use a rsync --delete every night to get a clean copy of
the prod server.

So what's the best way to do the restore ?

Thanks

Matthieu

Le 14/07/15 08:50, Guillaume Lelarge a écrit :
> 2015-07-14 8:09 GMT+02:00 Matthieu Lejeune
> <matthieu(dot)lejeune(at)exxoss(dot)com <mailto:matthieu(dot)lejeune(at)exxoss(dot)com>>:
>
> Hi,
>
> I had no recovery.conf on this server because I launch my
> replication every night I need a H-24 copy database.
>
>
> On your first email, you said: "I have a script for restoring a
> database every night to an other postgresql database". A restore is
> not replication, even it's a PITR restore.
>
> Still on your first email, you execute pg_start_backup() and
> pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I
> suppose p2prddnmdbm and 10.10.11.1 are the same server?
>
> This is my recovery.conf
> root(at)p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
> standby_mode = 'off'
> primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator
> password=XXXXXX'
> trigger_file = '/var/lib/postgresql/9.1/main/trigger'
> restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
> root(at)p2prddnmdbc:/var/lib/postgresql#
>
>
> You don't need primary_conninfo if you only want to restore your
> database. Though that isn't your issue right now.
>
> But with or without a recovery.conf file I can't start the
> database service :
> root(at)p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
> [....] Starting PostgreSQL 9.3 database server: main[....] The
> PostgreSQL server failed to start. Please check the log output:
> 2015-07-14 08:02:59 CEST LOG: database system was interrupted;
> last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST
> LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL:
> could not locate required checkpoint record 2015-07-14 08:02:59
> CEST HINT: If you are not restoring from a backup, try removing
> the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14
> 08:02:59 CEST LOG: startup process (PID 24617) exited with exit
> co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to
> startup process failure ... failed!
> failed!
>
>
> In your previous email, you had the startup process waiting for a
> file. These logs can't be the good ones.
>
> What you should probably do is tell us exactly what you want to do,
> and state what you do right now, and what logs you get, and what
> processes are on the server. That would help us to help you.
>
> Thanks
> Matthieu
>
> Le 12/07/15 13:46, Guillaume Lelarge a écrit :
>> Hi,
>>
>> 2015-07-12 10:18 GMT+02:00 Matthieu Lejeune
>> <matthieu(dot)lejeune(at)exxoss(dot)com <mailto:matthieu(dot)lejeune(at)exxoss(dot)com>>:
>>
>> Hi thank for your reply
>>
>> My target is to give a database for buisness testing query
>> and they are modify the database during the buisness day.
>>
>> Now I got this error if I keep the file backup_label :
>>
>> root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main#
>> <mailto:root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main#>
>> /etc/init.d/postgresql start
>> [....] Starting PostgreSQL 9.3 database server: main[....]
>> The PostgreSQL server failed to start. Please check the log
>> output: 2015-07-12 10:12:45 CEST LOG: database system was
>> shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45
>> CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST
>> FATAL: could not locate required checkpoint record 2015-07-12
>> 10:12:45 CEST HINT: If you are not restoring from a backup,
>> try removing the file
>> "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12
>> 10:12:45 CEST LOG: startup process (PID 28492) exited with
>> exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup
>> due to startup process failure ... failed!
>> failed!
>>
>>
>> If I put the recovery.conf the database is waiting for the
>> wal to relaunch the replication.
>>
>> postgres 27817 0.7 0.9 631212 39892 ? S 09:55
>> 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
>> /var/lib/postgresql/9.3/main
>> postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55
>> 0:00 \_ postgres: startup process waiting for
>> 0000000100000178000000B9
>> root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main#
>> <mailto:root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main#> su -
>> postgres
>> postgres(at)p2prddnmdbc:~$ psql energycomm
>> psql: FATAL: the database system is starting up
>>
>> Have you got an idea to stop the replication process and
>> start the database ?
>>
>>
>> What did you put in the recovery.conf file? (hint: standby_mode
>> must be off)
>>
>>
>> Kind regards
>> Matthieu
>>
>> Le 10/07/15 16:46, Keith a écrit :
>>> A recent, relevant post
>>>
>>> http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html
>>>
>>> On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge
>>> <guillaume(at)lelarge(dot)info <mailto:guillaume(at)lelarge(dot)info>> wrote:
>>>
>>> Hi,
>>>
>>> Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune"
>>> <matthieu(dot)lejeune(at)exxoss(dot)com
>>> <mailto:matthieu(dot)lejeune(at)exxoss(dot)com>> a écrit :
>>> >
>>> > Hi all,
>>> >
>>> > I have a script for restoring a database every night
>>> to an other postgresql database
>>> >
>>> > root(at)p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
>>> > #/bin/bash
>>> > /etc/init.d/postgresql stop
>>> > mv /var/log/postgresql/postgresql-9.3-main.log
>>> /var/log/postgresql/postgresql-9.3-main.log.old
>>> > cd /var/lib/postgresql/9.3/main
>>> > psql --host=p2prddnmdbm --username=replicator postgres
>>> -c "SELECT pg_start_backup('sync');"
>>> > rsync -av --delete
>>> root(at)10(dot)10(dot)11(dot)1:/var/lib/postgresql/9.3/main/*
>>> <mailto:root(at)10(dot)10(dot)11(dot)1:/var/lib/postgresql/9.3/main/*>
>>> /var/lib/postgresql/9.3/main/
>>> > rm backup_label
>>> > chown -R postgres:postgres *
>>> > psql --host=p2prddnmdbm --username=replicator postgres
>>> -c "SELECT pg_stop_backup();"
>>> > /etc/init.d/postgresql start
>>> > chmod 777 /var/log/postgresql/postgresql-9.3-main.log
>>> > psql -U postgres -c "ALTER USER xxxx WITH PASSWORD
>>> 'XXXX';"
>>> > psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
>>> > root(at)p2prddnmdbc:~#
>>> >
>>> >
>>> > But during the day when the user are using the new
>>> database we got error like this :
>>> >
>>> > 2015-06-25 16:20:58 CEST ERROR: could not read block
>>> 257985 in file "base/16386/14064061.1": read only 0 of
>>> 8192 bytes
>>> > 2015-06-22 15:21:11 CEST ERROR: could not read block
>>> 256801 in file "base/16386/14064061.1": read only 0 of
>>> 8192 bytes
>>> >
>>> > I have check the : filesystem on the vm, on the HW SAN,...
>>> >
>>> > Any idea to fix this problem?
>>>
>>> Sure. Don't remove the backup_label file, and add the
>>> recovery.conf file.
>>>
>>> --
>>> Guillaume
>>>
>>>
>>
>>
>>
>>
>>
>> --
>> Guillaume.
>> http://blog.guillaume.lelarge.info
>> http://www.dalibo.com
>
>
>
>
> --
> Guillaume.
> http://blog.guillaume.lelarge.info
> http://www.dalibo.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Andy Colson 2015-07-14 13:59:16 Re: could not create shared memory segment: Invalid argument
Previous Message Guillaume Lelarge 2015-07-14 06:50:58 Re: Automatic restore corruption problem