Skip site navigation (1) Skip section navigation (2)

Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes

From: Антон Степаненко <zlobnynigga(at)yandex(dot)ru>
To: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes
Date: 2011-06-17 13:51:00
Message-ID: 438511308318660@web144.yandex.ru (view raw or flat)
Thread:
Lists: pgsql-bugs

17.06.2011, 00:28, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>:
> ***** **********<zlobnynigga(at)yandex(dot)ru>; wrote:
>
>>  [4-1] 2011-06-16 17:40:27 UTC LOG:  startup process (PID 15292)
>>  was terminated by signal 7: Bus error
>>  Signal 7 means  hardware problems. But all 10 replicas crashed
>>  within 10 minutes, say from 13:35 to 13:45.
>>  One important thing - all replicas and master are running on
>>  openvz
>
> Were the PostgreSQL clusters sharing any hardware?
>
>>  there is no way to reject virtualization (it is a long story =))
>>
>>  Please, I do not want to discuss my decision to set buffers to
>>  12Gb and postgresql optimization at all. I just want to undestand
>>  why I'm getting such errors.
>
> On the face of it, the most likely cause would seem to be hardware
> or the virtual environment.  Without knowing more about the exact
> messages on the replicas and how they compared to each other and the
> master it's hard to know whether any of the replica failures were
> from passing corrupted data from the master to the replicas, versus
> having a common hardware/vm flaw.
>
> -Kevin

I noticed that crash takes place when shared buffers are almost full, i.e. SELECT SUM(size)  FROM adm.buffercache() returns 11670 at about one minute before crash. Furthermore, last night I set buffers  to 11Gb, at it is working, no crash, all buffers are used (11120).
I still do not believe that this is hardware problem. Each replica and master runs on dedicated server, no hardware is shared. There is only postgresql on each server, no any other software(just crond, zabbix, atop).
Actually openvz is used only for portability(easily add new replicas or migrate one of them to new server).
Messages on replicas are all the same: "could not read block", then "signal 7". I copypasted error log as is, that is all I know.
Master did not crash, I think because it processes less SELECT queries, therefore his buffers do not reach limit.

In response to

Responses

pgsql-bugs by date

Next:From: Kevin GrittnerDate: 2011-06-17 14:20:47
Subject: Re: BUG #6064: != NULL, <> NULL do not work
Previous:From: Christoph BergDate: 2011-06-17 11:10:34
Subject: Re: BUG #6066: [PATCH] Mark more strings as c-format

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group