Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: bricklen <bricklen(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup
Date: 2010-12-31 19:27:37
Message-ID: AANLkTimS5Bn9a4n_dNB+Y4EqkHNfg-x=kNa9V=YhGrEO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Dec 31, 2010 at 1:13 PM, bricklen <bricklen(at)gmail(dot)com> wrote:
> On Wed, Dec 29, 2010 at 1:53 PM, bricklen <bricklen(at)gmail(dot)com> wrote:
>> On Wed, Dec 29, 2010 at 12:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>> The difference in ctid, and the values of xmin and relfrozenxid,
>>> seems to confirm my suspicion that this wasn't just random cosmic rays.
>>> You did something on the source DB that rewrote the table with a new
>>> relfilenode (possibly CLUSTER or some form of ALTER TABLE; plain VACUUM
>>> or ANALYZE wouldn't do it).  And for some reason the standby hasn't
>>> picked up that change in the pg_class row.  I suspect the explanation
>>> is that your technique for setting up the standby is flawed.  You can't
>>> just rsync and have a valid snapshot of the DB --- you need to be sure
>>> that enough WAL gets replayed to fix any inconsistencies arising from
>>> the time-extended nature of the rsync operation.  But you didn't say
>>> exactly how you did that.
>>>
>>
>> Definitely no CLUSTER commands were issued, and there should have been
>> no ALTER commands issued (yesterday was a holiday, no one was here).
>> Would a TRUNCATE have the same effect though? I grep'd through our
>> application, and it appears that at least 3 tables get truncated, one
>> of them several times per hour. The often-truncated table wasn't one
>> of the bad ones, but the others are the ones I've already identified
>> as non-existent.
>
> Update: Set up the warm standby again and encountered the same issue,
> with two of the three previously-identified tables -- the ones that
> can get truncated throughout the day. We're going to try again
> overnight when those tables are not truncated and see if that gives us
> a correctly-working standby.
>
> From what I could find from posts to these lists, TRUNCATE commands do
> reset the relfilenode, and that could account for the issue we are
> experiencing. What I find odd is that we have one other table that is
> truncated every 15 minutes (aggregate table) but that one was fine in
> both attempts at the warm standby.

What O/S, kernel, and filesystem are you using?

--
Jon

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2010-12-31 19:37:32 Re: seg fault crashed the postmaster
Previous Message bricklen 2010-12-31 19:13:23 Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup