Re: Postgresql-9.0.1 Recovery

From: Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql-9.0.1 Recovery
Date: 2011-08-31 06:40:36
Message-ID: CAFrxt0gtRdiFdwPSCtfjSWYTRzkuZB63LZT02E3gw2QE8Hqc7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Craig !

Below is what i did -

1. pg_start_backup()
2. rsync the data dir
3. pg_stop_backup()

I believe the backup is valid because, i was able to bring up the cluster
without any issues (ofcourse with data loss).

+ve signs-

I am able to bring up the cluster with the Online backup, but, only with the
loss of data.

-ve signs and things to be strongly foreseen while backup testing -

- pg_clog files were not synced. I suspect they were being written at the
time of backup. I might have tried to sync the data dir when pg_clog files
were half filled.
- Though the WAL Archives are there, Postgres is not trying to recover
beyond the timestamp at which pg_clog was missing.
- Even if i replace the missing pg_clog files (which i did), Postgres is
asking for the corresponding wal archive files

Yes. What i learnt is that we need to ensure that all the pg_clog files must
be fully copied as on the backup time. We cannot afford to miss any of them.

Thanks
Venkat
On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:

> On 30/08/2011 6:59 PM, Venkat Balaji wrote:
>
>> Hello Everyone,
>>
>> I have a situation here -
>>
>> I am trying to restore the production online backup and recover the same.
>>
>> - I had initially rsynced (excluded pg_log) the data directory and the
>> tarred and zipped the same
>>
>
> Did you do that after pg_start_backup() or on a stopped database server?
>
> If you did it on a running database server without first running
> pg_start_backup(), your backup is invalid.
>
> Personally I like to take my base backups from an LVM snapshot of the
> datadir just to be extra safe. That isn't necessary, though, and a regular
> rsync or tar or whatever of a datadir after pg_start_backup() is fine.
>
> Remember to run pg_stop_backup() afterwards.
>
>
> - I got an error "unable to read <filename> from pg_clog location"
>> (file size is around 160K)
>>
>
> ... from PostgreSQL, when you tried to start it?
>
> What emitted that error message?
>
>
> What i understood is that, rsync some how missed out on syncing the
>> files in "pg_clog" so, i had manually coped the missing pg_clog file
>> from production and tried recovery.
>>
>
> That won't work. You need a consistent snapshot of all the files in the
> data dir. You cannot just mix and match copies taken at different times.
>
> For efficiency reasons PostgreSQL will recycle used clog files. You can't
> just copy a file over and hope that because it has the same name, it still
> contains the data you want.
>
> Your backup *failed* at the point where you got an incomplete copy of the
> data directory.
>
>
> Do i need to get that particular wal archive which is before online
>> backup time ?
>>
>
> No, you need to get the missing clog files. If you cannot do that, try
> using pg_resetxlog, but be aware that that may lose transactions and can
> potentially cause corruption of tables and indexes.
>
>
> By this experience what i understand is that Postgresql stores committed
>> and uncommited transactions in pg_xlog / wal archive files and
>> information (not the transaction data) about transaction commit status
>> is stored in pg_clog. Am I correct ?
>>
>
> That sounds right to me, but I don't know as much about how Pg stores
> things as I should.
>
>
> I am in the process of designing a disaster recovery planner for our
>> productions systems.
>>
>
> Congratulations!
>
> Be extremely glad this didn't happen in a real recovery scenario. This is a
> marvellous example of why you should always test your backups - you actually
> did, and found a problem that would've been a critical issue if the backup
> were actually needed.
>
> --
> Craig Ringer
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sim Zacks 2011-08-31 07:33:07 row is too big
Previous Message Scott Marlowe 2011-08-31 06:12:33 Re: Getting Table Names in a Particular Database