From: | Gordon Shannon <gordo169(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Tracking down log segment corruption |
Date: | 2010-05-02 17:55:26 |
Message-ID: | s2qb2dd93301005021055w93b80fb6q9ec27b826a550f7d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sun, May 2, 2010 at 11:02 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>
> Hmm ... AFAICS the only way to get that message when the incoming TID's
> offsetNumber is only 2 is for the index page to be completely empty
> (not zeroes, else PageAddItem's sanity check would have triggered,
> but valid and empty). What that smells like is a software bug, like
> failing to emit a WAL record in a case where it was necessary. Can you
> identify which index this was? (Look for relfilenode 48778276 in the
> database with OID 22362.) If so, can you give us any hints about
> unusual things that might have been done with that index?
>
Interesting. There is no pg_class entry for 22362. There is, however, an
entry for that filenode. It's an index I created Sat AM, about 6AM.
select oid, * from pg_class where relfilenode=48778276;
-[ RECORD 1 ]--+---------------------------
oid | 48777488
relname | cts_20100501_topic_date_nk
relnamespace | 2200
reltype | 0
relowner | 16412
relam | 403
relfilenode | 48778276
reltablespace | 48777166
relpages | 2476
reltuples | 58879
reltoastrelid | 0
reltoastidxid | 0
relhasindex | f
relisshared | f
relistemp | f
relkind | i
relnatts | 2
relchecks | 0
relhasoids | f
relhaspkey | f
relhasrules | f
relhastriggers | f
relhassubclass | f
relfrozenxid | 0
relacl | null
reloptions | null
Possibly relevant facts:
- The WSB server went active on Friday around 3:30PM
- On Friday evening, I added about 11 tablespaces. I noted the new files on
the WSB, no problems.
- On Sat morning, I created a partitioned table cts_20100501 (inherits from
another table) and 4 indexes.
- This morning, I was doing some table maintenance on the master and
discovered I had created this table and its indexes in the wrong tablespace.
I wanted the table in ts29, but had it in ts30. Vice versa for the
indexes. So I moved them. This is from my command history:
alter index cts_20100501_natural_uk set tablespace ts30;
alter index cts_20100501_pkey set tablespace ts30;
alter index cts_20100501_topic_date_nk set tablespace ts30;
alter index cts_20100501_updated_nk set tablespace ts30;
alter table cts_20100501 set tablespace ts29;
These commands worked fine on the master, yet this seems suspiciously
relevant.
> > Any suggestions?
>
> As far as recovering goes, there's probably not much you can do except
> resync the standby from scratch. But it would be nice to get to the
> bottom of the problem, so that we can fix the bug. Have you got an
> archive of this xlog segment and the ones before it, and would you be
> willing to let a developer look at them?
>
>
Before I received your reply, I had already started the re-sync, and
unfortunately already deleted the wal logs in question.
If it happens again, I will certainly keep them, and would be happy to share
them.
Regards,
Gordon
From | Date | Subject | |
---|---|---|---|
Next Message | Andy | 2010-05-02 17:57:33 | Re: PostgreSQL vs. Microsoft SQL server |
Previous Message | Tom Lane | 2010-05-02 17:02:03 | Re: Tracking down log segment corruption |