Quick Links

Re: Tracking down log segment corruption

From:	Gordon Shannon <gordo169(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Tracking down log segment corruption
Date:	2010-05-02 17:55:26
Message-ID:	s2qb2dd93301005021055w93b80fb6q9ec27b826a550f7d@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Sun, May 2, 2010 at 11:02 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
>
> Hmm ... AFAICS the only way to get that message when the incoming TID's
> offsetNumber is only 2 is for the index page to be completely empty
> (not zeroes, else PageAddItem's sanity check would have triggered,
> but valid and empty). What that smells like is a software bug, like
> failing to emit a WAL record in a case where it was necessary. Can you
> identify which index this was? (Look for relfilenode 48778276 in the
> database with OID 22362.) If so, can you give us any hints about
> unusual things that might have been done with that index?
>

Interesting. There is no pg_class entry for 22362. There is, however, an
entry for that filenode. It's an index I created Sat AM, about 6AM.

Possibly relevant facts:

- The WSB server went active on Friday around 3:30PM
- On Friday evening, I added about 11 tablespaces. I noted the new files on
the WSB, no problems.
- On Sat morning, I created a partitioned table cts_20100501 (inherits from
another table) and 4 indexes.
- This morning, I was doing some table maintenance on the master and
discovered I had created this table and its indexes in the wrong tablespace.
I wanted the table in ts29, but had it in ts30. Vice versa for the
indexes. So I moved them. This is from my command history:

alter index cts_20100501_natural_uk set tablespace ts30;
alter index cts_20100501_pkey set tablespace ts30;
alter index cts_20100501_topic_date_nk set tablespace ts30;
alter index cts_20100501_updated_nk set tablespace ts30;
alter table cts_20100501 set tablespace ts29;

These commands worked fine on the master, yet this seems suspiciously
relevant.

> > Any suggestions?
>
> As far as recovering goes, there's probably not much you can do except
> resync the standby from scratch. But it would be nice to get to the
> bottom of the problem, so that we can fix the bug. Have you got an
> archive of this xlog segment and the ones before it, and would you be
> willing to let a developer look at them?
>
>
Before I received your reply, I had already started the re-sync, and
unfortunately already deleted the wal logs in question.
If it happens again, I will certainly keep them, and would be happy to share
them.

Regards,
Gordon

In response to

Re: Tracking down log segment corruption at 2010-05-02 17:02:03 from Tom Lane

Responses

Re: Tracking down log segment corruption at 2010-05-02 18:10:01 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Andy	2010-05-02 17:57:33	Re: PostgreSQL vs. Microsoft SQL server
Previous Message	Tom Lane	2010-05-02 17:02:03	Re: Tracking down log segment corruption