Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: ***** ********** <zlobnynigga(at)yandex(dot)ru>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes
Date: 2011-06-17 15:56:52
Message-ID: 4DFB32F4020000250003E7E2@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

***** **********<zlobnynigga(at)yandex(dot)ru> wrote:
> 17.06.2011, 00:28, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>:
>> ***** **********<zlobnynigga(at)yandex(dot)ru>; wrote:
>>
>>> [4-1] 2011-06-16 17:40:27 UTC LOG: startup process (PID 15292)
>>> was terminated by signal 7: Bus error
>>> Signal 7 means hardware problems. But all 10 replicas crashed
>>> within 10 minutes, say from 13:35 to 13:45.
>>> One important thing - all replicas and master are running on
>>> openvz

>> On the face of it, the most likely cause would seem to be
>> hardware or the virtual environment.

> I noticed that crash takes place when shared buffers are almost
> full, i.e. SELECT SUM(size) FROM adm.buffercache() returns 11670
> at about one minute before crash. Furthermore, last night I set
> buffers to 11Gb, at it is working, no crash, all buffers are used
> (11120).

Well then, in a pinch you could always fall back to using what
works.

> I still do not believe that this is hardware problem.

How would an application cause a bus error?

> Each replica and master runs on dedicated server, no hardware is
> shared.

OK. If they had been on the same blade chassis or something I would
have suspected hardware.

> There is only postgresql on each server, no any other
> software(just crond, zabbix, atop). Actually openvz is used only
> for portability(easily add new replicas or migrate one of them to
> new server).

Still, it emulates hardware, so you have to consider it a suspect
for any hardware problem -- at least if you want to solve that
problem.

> Master did not crash

Ah, that wasn't clear from the earlier post. I'm not sure how
significant it is, but it's good to know.

> I think because it processes less SELECT queries, therefore his
> buffers do not reach limit.

In your shoes I would now be trying to construct a test program to
exercise progressively larger allocations of shared memory, and test
them both under openvz and without it. Well, first I would probably
try loading the master with queries to drive it to use the full
shared_buffers space, *then* move on to the test program.

The relevant question here is why others can successfully use large
shared_buffers settings while you can't. Something is different in
your environment. What?

-Kevin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Merlin Moncure 2011-06-17 16:19:26 Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes
Previous Message Tom Lane 2011-06-17 15:41:21 Re: Ident authentication fails due to bind error on server (8.4.8)