Re: Postgres error: could not open relation base/xxxxx/yyyyy

From: Pablo Delgado Díaz-Pache <delgadop(at)gmail(dot)com>
To: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Postgres error: could not open relation base/xxxxx/yyyyy
Date: 2010-11-15 12:42:32
Message-ID: AANLkTimfL5ZYd_iB5Rvd52UwQqvUgSjOAiB-a+hTOD1c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Achilleas,

Thanks for your quick response.
We do have backups and a couple of spare servers running in parallel so
we're safe in that sense. Thanks for your advice anyway.

To run *memtest86 *we have to go to the datacenter and that will take us a
few days. Is there anything we can do remotely?

Could it be anything different than memory HW problem?
Btw, we are not using that server at the moment :)

Thanks

On Mon, Nov 15, 2010 at 12:03 PM, Achilleas Mantzios <
achill(at)matrix(dot)gatewaynet(dot)com> wrote:

> Stop using the system immediately, since many things inserted to the DB
> might simply be garbage.
> Inspect your memory with memtest86.
>
> I would even suggest moving to a new HW if available, and start working
> into two parallel directions:
>
> a) try to bring your DB into a sane state
> b) try to fix your HW server, which apparently has problems.
>
> Files are not "disappearing" like that, in normal situations.
> You cannot trust your current HW to run postgresql. Its a guarantee method
> to lose your data, if you
> go on by this start/stop/restore/pray circle.
>
> IMPORTANT: Handle your old (pre-memory upgrade) backups like expensive
> chinese vases.
> Start from those good and trustworthy backups to do any database "replay"
> (or similar) work which
> would lead to a viable situation.
>
> Στις Monday 15 November 2010 11:55:38 ο/η Pablo Delgado Díaz-Pache έγραψε:
> > Hi all,
> >
> > We've been using postgres for 9 years without a problem until now! Two
> > problems in a very short time!
> > The first one is described in
> >
> http://postgresql.1045698.n5.nabble.com/Autovacuum-seems-to-block-database-WARNING-worker-took-too-long-to-start-td3264261.html
> > This is another one (not related I think) ...
> >
> > Postgres server is usually working fine. All of a sudden we start getting
> > these errors ...
> >
> > * **2010-11-09 11:49:15.320
> > CET|2|database1|10.19.0.51(18895)|20929|SELECT|4cd926fd.51c1|2010-11-09
> > 11:48:29 CET|10/417796|1390150|postgres| LOG: duration: 1518.422 ms
> > execute <unnamed>: SELECT id_token_fk,xxxxxxxxx ORDER BY avadate*
> > * **2010-11-09 11:52:25.364
> > CET|1|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > 11:51:59 CET|7/430041|0|postgres| ERROR: could not open relation
> > base/273198960/273198979: No such file or directory*
> > * **2010-11-09 11:52:25.364
> > CET|2|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > 11:51:59 CET|7/430041|0|postgres| STATEMENT: SELECT
> id_token_fkxxxxxxxxxxx
> > ORDER BY avadate*
> > * **2010-11-09 11:52:29.981
> > CET|3|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > 11:51:59 CET|7/430049|0|postgres| ERROR: could not open relation
> > base/273198960/273199235: No such file or directory*
> > * **2010-11-09 11:52:30.988
> > CET|6|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > 11:51:59 CET|7/430050|0|postgres| STATEMENT: SELECT max(avadate)
> xxxxxxxx
> > 32036)*
> > * **2010-11-09 11:53:36.346
> > CET|16|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> > 11:53:02 CET|94/516004|0|postgres| STATEMENT: SELECT * FROM "photos"
> > xxxxxxxxxxxxxx LIMIT 1*
> > * **2010-11-09 11:53:37.956
> > CET|17|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> > 11:53:02 CET|94/516025|0|postgres| ERROR: could not open relation
> > base/271253899/271254075: No such file or directory*
> > * **................*
> > * **................*
> > * **2010-11-09 11:53:55.560 CET|111|||26090||4cc6e970.65ea|2010-10-26
> > 16:45:04 CEST||0|| ERROR: could not open relation
> base/273198960/273199235:
> > No such file or directory*
> > * **2010-11-09 11:53:55.560 CET|112|||26090||4cc6e970.65ea|2010-10-26
> > 16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation
> > base/273198960/273199235*
> > * **2010-11-09 11:53:55.560 CET|113|||26090||4cc6e970.65ea|2010-10-26
> > 16:45:04 CEST||0|| WARNING: could not write block 8866 of
> > base/273198960/273199235*
> > * **2010-11-09 11:53:55.560 CET|114|||26090||4cc6e970.65ea|2010-10-26
> > 16:45:04 CEST||0|| DETAIL: Multiple failures --- write error might be
> > permanent.*
> > * **2010-11-09 11:53:56.590 CET|115|||26090||4cc6e970.65ea|2010-10-26
> > 16:45:04 CEST||0|| ERROR: could not open relation
> base/273198960/273199235:
> > No such file or directory*
> > * **2010-11-09 11:53:56.590 CET|116|||26090||4cc6e970.65ea|2010-10-26
> > 16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation
> > base/273198960/273199235*
> >
> > See that there are 2 different databases involved. (database1 and
> > database2).
> >
> > Looking for distinct errors (among the many we have in the log) I find
> there
> > are only 4 involved ...
> >
> > * **base/271253899/271254075*
> > * **base/273198960/273198979*
> > * **base/273198960/273199235*
> > * **base/273198960/273199253*
> >
> > and those files are not in the postgres base directory.
> >
> > To fix it we have no option but to restart postgres (which restarts fine
> > with a /etc/init.d/postgresql stop & start)
> > However, once we restarted postgres some data was corrupted. Tables that
> > used to have 4,5 million rows had only 60 rows. As a consequence we had
> to
> > restore from file system backup.
> > Once we did that, it worked fine for a few days until it happened again.
> > We're worried it can happen again!
> >
> > Could this error be a hardware problem?
> > We recently increased the memory from 8GB to 28GB, although it was
> working
> > fine for more than 3 weeks.
> > We also recently upgraded from postgres 8.3.6 to 8.4.5, althought it also
> > worked fine for a few months.
> > Upgrading to postgres 9 is easy for us. however, not sure that would
> help.
> >
> > Some info of our server:
> >
> > OS: Centos 5.5
> > Kernel: 2.6.18-194.1.el5
> > Postgres version: 8.4.5 (installation out-of-the-box using yum)
> > Server memory: 28GB
> >
> >
> > Any help would be appreciated
> >
> > Pablo
> >
>
>
>
> --
> Achilleas Mantzios
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Gerhard Hintermayer 2010-11-15 12:54:18 Re: Find all running postgres DB servers on a network
Previous Message Achilleas Mantzios 2010-11-15 11:03:39 Re: Postgres error: could not open relation base/xxxxx/yyyyy