Postgres error: could not open relation base/xxxxx/yyyyy

From: Pablo Delgado Díaz-Pache <delgadop(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Postgres error: could not open relation base/xxxxx/yyyyy
Date: 2010-11-15 09:55:38
Message-ID: AANLkTi=L+hbNH_Zvx8FS7hVcm6DgwLKPbSsz1Dp9cbva@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi all,

We've been using postgres for 9 years without a problem until now! Two
problems in a very short time!
The first one is described in
http://postgresql.1045698.n5.nabble.com/Autovacuum-seems-to-block-database-WARNING-worker-took-too-long-to-start-td3264261.html
This is another one (not related I think) ...

Postgres server is usually working fine. All of a sudden we start getting
these errors ...

* **2010-11-09 11:49:15.320
CET|2|database1|10.19.0.51(18895)|20929|SELECT|4cd926fd.51c1|2010-11-09
11:48:29 CET|10/417796|1390150|postgres| LOG: duration: 1518.422 ms
execute <unnamed>: SELECT id_token_fk,xxxxxxxxx ORDER BY avadate*
* **2010-11-09 11:52:25.364
CET|1|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
11:51:59 CET|7/430041|0|postgres| ERROR: could not open relation
base/273198960/273198979: No such file or directory*
* **2010-11-09 11:52:25.364
CET|2|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
11:51:59 CET|7/430041|0|postgres| STATEMENT: SELECT id_token_fkxxxxxxxxxxx
ORDER BY avadate*
* **2010-11-09 11:52:29.981
CET|3|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
11:51:59 CET|7/430049|0|postgres| ERROR: could not open relation
base/273198960/273199235: No such file or directory*
* **2010-11-09 11:52:30.988
CET|6|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
11:51:59 CET|7/430050|0|postgres| STATEMENT: SELECT max(avadate) xxxxxxxx
32036)*
* **2010-11-09 11:53:36.346
CET|16|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
11:53:02 CET|94/516004|0|postgres| STATEMENT: SELECT * FROM "photos"
xxxxxxxxxxxxxx LIMIT 1*
* **2010-11-09 11:53:37.956
CET|17|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
11:53:02 CET|94/516025|0|postgres| ERROR: could not open relation
base/271253899/271254075: No such file or directory*
* **................*
* **................*
* **2010-11-09 11:53:55.560 CET|111|||26090||4cc6e970.65ea|2010-10-26
16:45:04 CEST||0|| ERROR: could not open relation base/273198960/273199235:
No such file or directory*
* **2010-11-09 11:53:55.560 CET|112|||26090||4cc6e970.65ea|2010-10-26
16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation
base/273198960/273199235*
* **2010-11-09 11:53:55.560 CET|113|||26090||4cc6e970.65ea|2010-10-26
16:45:04 CEST||0|| WARNING: could not write block 8866 of
base/273198960/273199235*
* **2010-11-09 11:53:55.560 CET|114|||26090||4cc6e970.65ea|2010-10-26
16:45:04 CEST||0|| DETAIL: Multiple failures --- write error might be
permanent.*
* **2010-11-09 11:53:56.590 CET|115|||26090||4cc6e970.65ea|2010-10-26
16:45:04 CEST||0|| ERROR: could not open relation base/273198960/273199235:
No such file or directory*
* **2010-11-09 11:53:56.590 CET|116|||26090||4cc6e970.65ea|2010-10-26
16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation
base/273198960/273199235*

See that there are 2 different databases involved. (database1 and
database2).

Looking for distinct errors (among the many we have in the log) I find there
are only 4 involved ...

* **base/271253899/271254075*
* **base/273198960/273198979*
* **base/273198960/273199235*
* **base/273198960/273199253*

and those files are not in the postgres base directory.

To fix it we have no option but to restart postgres (which restarts fine
with a /etc/init.d/postgresql stop & start)
However, once we restarted postgres some data was corrupted. Tables that
used to have 4,5 million rows had only 60 rows. As a consequence we had to
restore from file system backup.
Once we did that, it worked fine for a few days until it happened again.
We're worried it can happen again!

Could this error be a hardware problem?
We recently increased the memory from 8GB to 28GB, although it was working
fine for more than 3 weeks.
We also recently upgraded from postgres 8.3.6 to 8.4.5, althought it also
worked fine for a few months.
Upgrading to postgres 9 is easy for us. however, not sure that would help.

Some info of our server:

OS: Centos 5.5
Kernel: 2.6.18-194.1.el5
Postgres version: 8.4.5 (installation out-of-the-box using yum)
Server memory: 28GB

Any help would be appreciated

Pablo

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Achilleas Mantzios 2010-11-15 11:03:39 Re: Postgres error: could not open relation base/xxxxx/yyyyy
Previous Message Pablo Delgado Díaz-Pache 2010-11-15 07:52:53 Re: Autovacuum seems to block database: WARNING worker took too long to start