Re: Postgres error: could not open relation base/xxxxx/yyyyy

From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Postgres error: could not open relation base/xxxxx/yyyyy
Date: 2010-11-15 11:03:39
Message-ID: 201011151303.39898.achill@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Stop using the system immediately, since many things inserted to the DB might simply be garbage.
Inspect your memory with memtest86.

I would even suggest moving to a new HW if available, and start working into two parallel directions:

a) try to bring your DB into a sane state
b) try to fix your HW server, which apparently has problems.

Files are not "disappearing" like that, in normal situations.
You cannot trust your current HW to run postgresql. Its a guarantee method to lose your data, if you
go on by this start/stop/restore/pray circle.

IMPORTANT: Handle your old (pre-memory upgrade) backups like expensive chinese vases.
Start from those good and trustworthy backups to do any database "replay" (or similar) work which
would lead to a viable situation.

Στις Monday 15 November 2010 11:55:38 ο/η Pablo Delgado Díaz-Pache έγραψε:
> Hi all,
>
> We've been using postgres for 9 years without a problem until now! Two
> problems in a very short time!
> The first one is described in
> http://postgresql.1045698.n5.nabble.com/Autovacuum-seems-to-block-database-WARNING-worker-took-too-long-to-start-td3264261.html
> This is another one (not related I think) ...
>
> Postgres server is usually working fine. All of a sudden we start getting
> these errors ...
>
> * **2010-11-09 11:49:15.320
> CET|2|database1|10.19.0.51(18895)|20929|SELECT|4cd926fd.51c1|2010-11-09
> 11:48:29 CET|10/417796|1390150|postgres| LOG: duration: 1518.422 ms
> execute <unnamed>: SELECT id_token_fk,xxxxxxxxx ORDER BY avadate*
> * **2010-11-09 11:52:25.364
> CET|1|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> 11:51:59 CET|7/430041|0|postgres| ERROR: could not open relation
> base/273198960/273198979: No such file or directory*
> * **2010-11-09 11:52:25.364
> CET|2|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> 11:51:59 CET|7/430041|0|postgres| STATEMENT: SELECT id_token_fkxxxxxxxxxxx
> ORDER BY avadate*
> * **2010-11-09 11:52:29.981
> CET|3|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> 11:51:59 CET|7/430049|0|postgres| ERROR: could not open relation
> base/273198960/273199235: No such file or directory*
> * **2010-11-09 11:52:30.988
> CET|6|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> 11:51:59 CET|7/430050|0|postgres| STATEMENT: SELECT max(avadate) xxxxxxxx
> 32036)*
> * **2010-11-09 11:53:36.346
> CET|16|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> 11:53:02 CET|94/516004|0|postgres| STATEMENT: SELECT * FROM "photos"
> xxxxxxxxxxxxxx LIMIT 1*
> * **2010-11-09 11:53:37.956
> CET|17|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> 11:53:02 CET|94/516025|0|postgres| ERROR: could not open relation
> base/271253899/271254075: No such file or directory*
> * **................*
> * **................*
> * **2010-11-09 11:53:55.560 CET|111|||26090||4cc6e970.65ea|2010-10-26
> 16:45:04 CEST||0|| ERROR: could not open relation base/273198960/273199235:
> No such file or directory*
> * **2010-11-09 11:53:55.560 CET|112|||26090||4cc6e970.65ea|2010-10-26
> 16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation
> base/273198960/273199235*
> * **2010-11-09 11:53:55.560 CET|113|||26090||4cc6e970.65ea|2010-10-26
> 16:45:04 CEST||0|| WARNING: could not write block 8866 of
> base/273198960/273199235*
> * **2010-11-09 11:53:55.560 CET|114|||26090||4cc6e970.65ea|2010-10-26
> 16:45:04 CEST||0|| DETAIL: Multiple failures --- write error might be
> permanent.*
> * **2010-11-09 11:53:56.590 CET|115|||26090||4cc6e970.65ea|2010-10-26
> 16:45:04 CEST||0|| ERROR: could not open relation base/273198960/273199235:
> No such file or directory*
> * **2010-11-09 11:53:56.590 CET|116|||26090||4cc6e970.65ea|2010-10-26
> 16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation
> base/273198960/273199235*
>
> See that there are 2 different databases involved. (database1 and
> database2).
>
> Looking for distinct errors (among the many we have in the log) I find there
> are only 4 involved ...
>
> * **base/271253899/271254075*
> * **base/273198960/273198979*
> * **base/273198960/273199235*
> * **base/273198960/273199253*
>
> and those files are not in the postgres base directory.
>
> To fix it we have no option but to restart postgres (which restarts fine
> with a /etc/init.d/postgresql stop & start)
> However, once we restarted postgres some data was corrupted. Tables that
> used to have 4,5 million rows had only 60 rows. As a consequence we had to
> restore from file system backup.
> Once we did that, it worked fine for a few days until it happened again.
> We're worried it can happen again!
>
> Could this error be a hardware problem?
> We recently increased the memory from 8GB to 28GB, although it was working
> fine for more than 3 weeks.
> We also recently upgraded from postgres 8.3.6 to 8.4.5, althought it also
> worked fine for a few months.
> Upgrading to postgres 9 is easy for us. however, not sure that would help.
>
> Some info of our server:
>
> OS: Centos 5.5
> Kernel: 2.6.18-194.1.el5
> Postgres version: 8.4.5 (installation out-of-the-box using yum)
> Server memory: 28GB
>
>
> Any help would be appreciated
>
> Pablo
>

--
Achilleas Mantzios

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Pablo Delgado Díaz-Pache 2010-11-15 12:42:32 Re: Postgres error: could not open relation base/xxxxx/yyyyy
Previous Message Pablo Delgado Díaz-Pache 2010-11-15 09:55:38 Postgres error: could not open relation base/xxxxx/yyyyy