BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)

From: maxim(dot)boguk(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)
Date: 2012-07-09 23:54:54
Message-ID: E1SoNn4-00086s-Ie@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 6725
Logged by: Maxim Boguk
Email address: maxim(dot)boguk(at)gmail(dot)com
PostgreSQL version: 9.0.7
Operating system: Debian Linux
Description:

Hi,

On one of my production databases hot-standby server start repeatable crash
after restart.

There is log:

2012-07-10 03:30:20 MSK 12034 postgres(at)postgres from [local] [vxid:2/0
txid:0] [CHECKPOINT] LOG: duration: 100.262 ms statement: checkpoint;
2012-07-10 03:30:24 MSK 12034 postgres(at)postgres from [local] [vxid:2/0
txid:0] [CHECKPOINT] LOG: duration: 100.207 ms statement: checkpoint;
2012-07-10 03:30:41 MSK 24618 @ from [vxid: txid:0] [] LOG: received fast
shutdown request
2012-07-10 03:30:41 MSK 24618 @ from [vxid: txid:0] [] LOG: aborting any
active transactions
2012-07-10 03:30:41 MSK 25312 @ from [vxid: txid:0] [] FATAL: terminating
walreceiver process due to administrator command
2012-07-10 03:30:41 MSK 24646 @ from [vxid: txid:0] [] LOG: shutting down
2012-07-10 03:30:41 MSK 24646 @ from [vxid: txid:0] [] LOG: database
system is shut down

2012-07-10 03:30:45 MSK 12202 @ from [vxid: txid:0] [] LOG: database
system was shut down in recovery at 2012-07-10 03:30:41 MSK
2012-07-10 03:30:45 MSK 12202 @ from [vxid: txid:0] [] LOG: entering
standby mode
2012-07-10 03:30:45 MSK 12202 @ from [vxid:1/0 txid:0] [] LOG: redo starts
at 127C/51EF6798
2012-07-10 03:31:35 MSK 12202 @ from [vxid:1/0 txid:0] [] PANIC: GIN
metapage disappeared
2012-07-10 03:31:35 MSK 12202 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog
redo Update metapage, node: 1663/62450/27663577 blkno: 2164
2012-07-10 03:31:35 MSK 12189 @ from [vxid: txid:0] [] LOG: startup
process (PID 12202) was terminated by signal 6: Aborted
2012-07-10 03:31:35 MSK 12189 @ from [vxid: txid:0] [] LOG: terminating
any other active server processes

2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] LOG: database
system was interrupted while in recovery at log time 2012-07-10 02:17:21
MSK
2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] HINT: If this has
occurred more than once some data might be corrupted and you might need to
choose an earlier recovery target.
2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] LOG: entering
standby mode
2012-07-10 03:33:37 MSK 13071 @ from [vxid:1/0 txid:0] [] LOG: redo starts
at 127C/51EF6798
2012-07-10 03:34:15 MSK 13071 @ from [vxid:1/0 txid:0] [] PANIC: GIN
metapage disappeared
2012-07-10 03:34:15 MSK 13071 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog
redo Update metapage, node: 1663/62450/27663577 blkno: 2164
2012-07-10 03:34:15 MSK 13065 @ from [vxid: txid:0] [] LOG: startup
process (PID 13071) was terminated by signal 6: Aborted
2012-07-10 03:34:15 MSK 13065 @ from [vxid: txid:0] [] LOG: terminating
any other active server processes

and so on.

I almost sure there are no disk corruption or such errors.

Now some additional information from the master server:
1)1663/62450/27663577 - corresponding to the pg_default tablespace and
wsfilm database. However, no entries with relfilenode=27663577 were found in
pg_class.
In the same time:
ls -la ./9.0/main/base/62450/27663577
-rw------- 1 postgres postgres 0 Июл 10 02:29
./9.0/main/base/62450/27663577

In wsfilm db only one GIN index exists:
wsfilm=# SELECT * from pg_indexes where indexdef like '%gin%';
-[ RECORD 1 ]---------------------------------------------------
schemaname | public
tablename | movie
indexname | movie_fti_idx
tablespace |
indexdef | CREATE INDEX movie_fti_idx ON movie USING gin (fti)

But it has different relfilenode:
wsfilm=# SELECT relfilenode from pg_class where relname='movie_fti_idx';
-[ RECORD 1 ]---------
relfilenode | 27677714

2)Just few minutes before that index was rebuild on the master database:

2012-07-10 02:29:50 MSK 1480 postgres(at)wsfilm from [local] [vxid:33/3324338
txid:0] [idle] LOG: statement: CREATE INDEX CONCURRENTLY
pgcompactor_tmp1436 ON movie USING gin (fti);
2012-07-10 02:29:52 MSK 1480 postgres(at)wsfilm from [local] [vxid:33/3324343
txid:0] [idle] LOG: statement: BEGIN; DROP INDEX public.movie_fti_idx;
ALTER INDEX public.pgcompactor_tmp1436 RENAME TO movie_fti_idx; END;

My theory that the wal replay trying replay GIN index information on already
dropped index.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2012-07-10 01:39:11 Re: BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)
Previous Message chine.bleu 2012-07-09 17:19:40 BUG #6724: EXCEPTION syntax not recgonised