Re: Automatic restore corruption problem

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Matthieu Lejeune <matthieu(dot)lejeune(at)exxoss(dot)com>
Cc: Keith <keith(at)keithf4(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Automatic restore corruption problem
Date: 2015-07-14 06:50:58
Message-ID: CAECtzeVjzu3Kx5MccTpkKa7sGQmvTSCD3X54Cgc1jgsvqyeFPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu(dot)lejeune(at)exxoss(dot)com>:

> Hi,
>
> I had no recovery.conf on this server because I launch my replication
> every night I need a H-24 copy database.
>
>
On your first email, you said: "I have a script for restoring a database
every night to an other postgresql database". A restore is not replication,
even it's a PITR restore.

Still on your first email, you execute pg_start_backup() and
pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose
p2prddnmdbm and 10.10.11.1 are the same server?

This is my recovery.conf
> root(at)p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
> standby_mode = 'off'
> primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator
> password=XXXXXX'
> trigger_file = '/var/lib/postgresql/9.1/main/trigger'
> restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
> root(at)p2prddnmdbc:/var/lib/postgresql#
>
>
You don't need primary_conninfo if you only want to restore your database.
Though that isn't your issue right now.

> But with or without a recovery.conf file I can't start the database
> service :
> root(at)p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
> [....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL
> server failed to start. Please check the log output: 2015-07-14 08:02:59
> CEST LOG: database system was interrupted; last known up at 2015-07-13
> 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record
> 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record
> 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try
> removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14
> 08:02:59 CEST LOG: startup process (PID 24617) exited with exit
> co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup
> process failure ... failed!
> failed!
>
>
In your previous email, you had the startup process waiting for a file.
These logs can't be the good ones.

What you should probably do is tell us exactly what you want to do, and
state what you do right now, and what logs you get, and what processes are
on the server. That would help us to help you.

Thanks
> Matthieu
>
> Le 12/07/15 13:46, Guillaume Lelarge a écrit :
>
> Hi,
>
> 2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <
> <matthieu(dot)lejeune(at)exxoss(dot)com>matthieu(dot)lejeune(at)exxoss(dot)com>:
>
>> Hi thank for your reply
>>
>> My target is to give a database for buisness testing query and they are
>> modify the database during the buisness day.
>>
>> Now I got this error if I keep the file backup_label :
>>
>> root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql
>> start
>> [....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL
>> server failed to start. Please check the log output: 2015-07-12 10:12:45
>> CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST
>> 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45
>> CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45
>> CEST HINT: If you are not restoring from a backup, try removing the file
>> "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG:
>> startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45
>> CEST LOG: abo[FAIL startup due to startup process failure ... failed!
>> failed!
>>
>>
>> If I put the recovery.conf the database is waiting for the wal to
>> relaunch the replication.
>>
>> postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00
>> /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
>> postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_
>> postgres: startup process waiting for 0000000100000178000000B9
>> root(at)p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
>> postgres(at)p2prddnmdbc:~$ psql energycomm
>> psql: FATAL: the database system is starting up
>>
>> Have you got an idea to stop the replication process and start the
>> database ?
>>
>>
> What did you put in the recovery.conf file? (hint: standby_mode must be
> off)
>
>
>
>> Kind regards
>> Matthieu
>>
>> Le 10/07/15 16:46, Keith a écrit :
>>
>> A recent, relevant post
>>
>>
>> http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html
>>
>> On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <
>> <guillaume(at)lelarge(dot)info>guillaume(at)lelarge(dot)info> wrote:
>>
>>> Hi,
>>>
>>> Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <
>>> <matthieu(dot)lejeune(at)exxoss(dot)com>matthieu(dot)lejeune(at)exxoss(dot)com> a écrit :
>>> >
>>> > Hi all,
>>> >
>>> > I have a script for restoring a database every night to an other
>>> postgresql database
>>> >
>>> > root(at)p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
>>> > #/bin/bash
>>> > /etc/init.d/postgresql stop
>>> > mv /var/log/postgresql/postgresql-9.3-main.log
>>> /var/log/postgresql/postgresql-9.3-main.log.old
>>> > cd /var/lib/postgresql/9.3/main
>>> > psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT
>>> pg_start_backup('sync');"
>>> > rsync -av --delete <root(at)10(dot)10(dot)11(dot)1:/var/lib/postgresql/9.3/main/*>
>>> root(at)10(dot)10(dot)11(dot)1:/var/lib/postgresql/9.3/main/*
>>> /var/lib/postgresql/9.3/main/
>>> > rm backup_label
>>> > chown -R postgres:postgres *
>>> > psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT
>>> pg_stop_backup();"
>>> > /etc/init.d/postgresql start
>>> > chmod 777 /var/log/postgresql/postgresql-9.3-main.log
>>> > psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
>>> > psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
>>> > root(at)p2prddnmdbc:~#
>>> >
>>> >
>>> > But during the day when the user are using the new database we got
>>> error like this :
>>> >
>>> > 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file
>>> "base/16386/14064061.1": read only 0 of 8192 bytes
>>> > 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file
>>> "base/16386/14064061.1": read only 0 of 8192 bytes
>>> >
>>> > I have check the : filesystem on the vm, on the HW SAN,...
>>> >
>>> > Any idea to fix this problem?
>>>
>>> Sure. Don't remove the backup_label file, and add the recovery.conf file.
>>>
>>> --
>>> Guillaume
>>>
>>
>>
>>
>>
>
>
> --
> Guillaume.
> http://blog.guillaume.lelarge.info
> http://www.dalibo.com
>
>
>

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Matthieu Lejeune 2015-07-14 10:32:53 Re: Automatic restore corruption problem
Previous Message Matthieu Lejeune 2015-07-14 06:09:03 Re: Automatic restore corruption problem