Re: Postgresql + corrupted disk = data loss. (Need help for database recover)

From: Oleg Serov <serovov(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql + corrupted disk = data loss. (Need help for database recover)
Date: 2011-12-03 15:31:05
Message-ID: CAMkKa8DMHb+vuy4k4v7_mDtbVoeqjw_V1hU8iQgG8U10ENujNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I think, the main problem is that, postgres reads wrong xlog-s file.

I lunched strace to postgres process, and then i grep the log:
# cat /tmp/strace-log | fgrep xlog
5546 stat("pg_xlog", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546 stat("pg_xlog/archive_status", {st_mode=S_IFDIR|0700, st_size=4096,
...}) = 0
5546 open("pg_xlog/00000001.history", O_RDONLY) = -1 ENOENT (No such file
or directory)
5546 open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4
5546 open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4

There is only one file.

But i have a lot of files (i parsed filenames for more readable format):
0000000100000010000000D2 tli: 1 log: 16 seg: 210
0000000100000010000000EE tli: 1 log: 16 seg: 238
0000000100000010000000D3 tli: 1 log: 16 seg: 211
0000000100000010000000E2 tli: 1 log: 16 seg: 226
0000000100000010000000D5 tli: 1 log: 16 seg: 213
0000000100000010000000E8 tli: 1 log: 16 seg: 232
0000000100000010000000F7 tli: 1 log: 16 seg: 247
0000000100000010000000DF tli: 1 log: 16 seg: 223
0000000100000010000000DC tli: 1 log: 16 seg: 220
0000000100000010000000E7 tli: 1 log: 16 seg: 231
0000000100000010000000EA tli: 1 log: 16 seg: 234
0000000100000010000000D1 tli: 1 log: 16 seg: 209
0000000100000010000000DD tli: 1 log: 16 seg: 221
0000000100000010000000F5 tli: 1 log: 16 seg: 245
0000000100000010000000E0 tli: 1 log: 16 seg: 224
0000000100000010000000EB tli: 1 log: 16 seg: 235
0000000100000010000000D0 tli: 1 log: 16 seg: 208
0000000100000010000000F4 tli: 1 log: 16 seg: 244
0000000100000010000000F6 tli: 1 log: 16 seg: 246
0000000100000010000000D7 tli: 1 log: 16 seg: 215
0000000100000010000000DB tli: 1 log: 16 seg: 219
0000000100000010000000E4 tli: 1 log: 16 seg: 228
0000000100000010000000DE tli: 1 log: 16 seg: 222
0000000100000010000000E9 tli: 1 log: 16 seg: 233
0000000100000010000000D4 tli: 1 log: 16 seg: 212
0000000100000010000000D9 tli: 1 log: 16 seg: 217
0000000100000010000000F3 tli: 1 log: 16 seg: 243
0000000100000010000000E5 tli: 1 log: 16 seg: 229
0000000100000010000000DA tli: 1 log: 16 seg: 218
0000000100000010000000EC tli: 1 log: 16 seg: 236
0000000100000010000000D6 tli: 1 log: 16 seg: 214
0000000100000010000000EF tli: 1 log: 16 seg: 239
0000000100000010000000E6 tli: 1 log: 16 seg: 230
0000000100000010000000E1 tli: 1 log: 16 seg: 225
0000000100000010000000F0 tli: 1 log: 16 seg: 240
0000000100000010000000D8 tli: 1 log: 16 seg: 216
0000000100000010000000CF tli: 1 log: 16 seg: 207
0000000100000010000000ED tli: 1 log: 16 seg: 237
0000000100000010000000E3 tli: 1 log: 16 seg: 227
0000000100000010000000F1 tli: 1 log: 16 seg: 241
0000000100000010000000F2 tli: 1 log: 16 seg: 242
0000000100000010000000F8 tli: 1 log: 16 seg: 248

So, main problem i think, is that pg_control file is corrupted (i guess).

So xlogreset-n sais:
bash-3.2$ pg_resetxlog -n /var/lib/pgsql/data
could not change directory to "/root"
pg_control values:

*First log file ID after reset: 16*
First log file segment after reset: 249
pg_control version number: 843
Catalog version number: 200904091
Database system identifier: 5592178670599662815
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/7760685
Latest checkpoint's NextOID: 2556003
Latest checkpoint's NextMultiXactId: 3925
Latest checkpoint's NextMultiOffset: 7901
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

*And main question, how to force postgres to read this files?*

3 декабря 2011 г. 18:10 пользователь Tomas Vondra <tv(at)fuzzy(dot)cz> написал:

> On 2.12.2011 09:16, Oleg Serov wrote:
> > Hello!
> >
> > i've don't try to do reindex. There was enough space.
>
> Not sure whether you tried to reindex or not. And what do you mean by
> 'there was enough space'? For example with ext2 (and ext3/ext4) it was
> rather simple to exhaust inodes long before the device was actually
> full. What filesystem are you using, anyway?
>
> This seems like a I/O issue, you should check the hardware and the
> settings (e.g. what caches are enabled etc.). Post more details, if
> possible. Have you checked S.M.A.R.T. info from the drives?
>
> > And i have a full data-directory backup, when i've stop server, before
> > start.
>
> Good. Have you moved it to a different machine? Otherwise you don't have
> a backup, just a copy.
>
> Tomas
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
С уважением

Олег

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Christoph Zwerschke 2011-12-03 17:02:42 Re: Shared memory usage in PostgreSQL 9.1
Previous Message Torsten Zühlsdorff 2011-12-03 14:35:50 Re: How to convert HEX to ASCII?