Unfamous 'could not read block ... in file "...": read only 0 of 8192 bytes' again

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Unfamous 'could not read block ... in file "...": read only 0 of 8192 bytes' again
Date: 2012-02-21 00:10:09
Message-ID: CAK-MWwT_5N1jwSH8F84ZD2iPrPCuuBoA0Tu-Ujw8+SXMg1OELQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

One of servers under my support 2 days ago produced the next error:

ERROR: could not read block 135 in file "base/16404/118881486": read only
0 of 8192 bytes

Server version 9.0.6. No db or server crashes or db recovery happen on that
server since setup.
Server is db backend for very large web project under quite a heavy load
(10k+ request per second to db).
No other strange errors found in the logs since server was put to
production half year ago.

Now what happened (from DB log):

2012-02-17 22:35:58 MSK 14333 [vxid:340/18229999 txid:2341883282] [DELETE]
LOG: duration: 5.669 ms execute <unnamed>: delete from agency_statistics
2012-02-17 22:35:58 MSK 14333 [vxid:340/18230000 txid:0] [BIND] LOG:
duration: 2.787 ms bind <unnamed>: insert into agency_statistics
(employer_id, area_id, area_pa
th, professional_area_id, vacancies_number) select e.employer_id,
vb.area_id, a.path, s.professional_area_id, count(distinct v.vacancy_id)
from vacancy v inner join employer e on v.employer
_id = e.employer_id inner join vacancy_body vb on v.vacancy_body_id =
vb.vacancy_body_id inner join vacancy_body_specialization vs on
vb.vacancy_body_id = vs.vacancy_body_id inner join spec
ialization s on s.specialization_id = vs.specialization_id inner join area
a on vb.area_id = a.area_id where v.archived_status in (0) and v.disabled =
false and e.category = $1 and e.state
= $2 and e.manager_id > 0 group by e.employer_id, vb.area_id, a.path,
s.professional_area_id
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230000 txid:2341883336] [INSERT]
LOG: duration: 726.700 ms execute <unnamed>: insert into
agency_statistics (employer_id,
area_id, area_path, professional_area_id, vacancies_number) select
e.employer_id, vb.area_id, a.path, s.professional_area_id, count(distinct
v.vacancy_id) from vacancy v inner join employer
e on v.employer_id = e.employer_id inner join vacancy_body vb on
v.vacancy_body_id = vb.vacancy_body_id inner join
vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id
inner join specialization s on s.specialization_id = vs.specialization_id
inner join area a on vb.area_id = a.area_id where v.archived_status in (0)
and v.disabled = false and e.category =
$1 and e.state = $2 and e.manager_id > 0 group by e.employer_id,
vb.area_id, a.path, s.professional_area_id
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230070 txid:0] [BIND] LOG:
duration: 30.195 ms bind S_35: insert into agency_statistics (employer_id,
area_id, area_path,
professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0)
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT]
ERROR: could not read block 135 in file "base/16404/118881486": read only
0 of 8192 bytes
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT]
STATEMENT: insert into agency_statistics (employer_id, area_id, area_path,
professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0)

No concurrent writes was happened to that table outside of that connection.
May be autovacuum were fired on that table between delete and first or
second inserts (unfortunately it wasn't logged).

I recreated table from scratch and keep the damaged table under another
name (through alter table agency_statistics rename to
agency_statistics_old).
So I have files to dig into.

What I see in file system:
hh=# SELECT relfilenode from pg_class where relname='agency_statistics_old';
relfilenode
-------------
118881486
postgres(at)db10:~/tmp$ ls -la
/var/lib/postgresql/9.0/main/base/16404/118881486
-rw------- 1 postgres postgres 0 2012-02-20 12:04
/var/lib/postgresql/9.0/main/base/16404/118881486

So table file size zero bytes (seems autovacuum truncated that table to 0
bytes).

But index show:
hh=# SELECT relfilenode from pg_class where
relname='agency_statistics_pkey';
relfilenode
-------------
118881489
postgres(at)db10:~/tmp$ ls -la
/var/lib/postgresql/9.0/main/base/16404/118881489
-rw------- 1 postgres postgres 4956160 2012-02-20 12:07
/var/lib/postgresql/9.0/main/base/16404/118881489

hh=# SELECT * from pgstatindex('agency_statistics_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 2
index_size | 4947968
root_block_no | 295
internal_pages | 1
leaf_pages | 601
empty_pages | 0
deleted_pages | 1
avg_leaf_density | 0.45
leaf_fragmentation | 13.14

I out of ideas now.
Any suggestions where and what I should look next?

Kind Regards,
Maksym

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
"If they can send one man to the moon... why can't they send them all?"

МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-02-21 01:32:01 Re: Unfamous 'could not read block ... in file "...": read only 0 of 8192 bytes' again
Previous Message John R Pierce 2012-02-20 20:54:07 Re: Question on Rules

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-02-21 01:19:55 Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Previous Message Simon Riggs 2012-02-20 23:23:42 Re: 16-bit page checksums for 9.2