Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes
Date: 2011-06-18 05:58:19
Message-ID: 4DFC3E7B.9060300@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

bOn 06/17/2011 04:47 PM, Антон Степаненко wrote:
> Memory for shared buffers can not be ovesubscribed - because if kernel
> did not provide enough shared memory postgres will not start.

The block is allocated at once. But the amount of it that various
client backends end up touching varies as they run, slowly increasing
over time as they access more buffers. After running for a while, the
individual processes will look like this:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2645 gsmith 20 0 12.3g 5.1g 5.1g D 45 32.8 16:59.19 postgres:
gsmith pgbench [local] SELECT

Where their virtual memory size becomes slightly larger than shared_buffers.

I tested this out on a Debian system here, set shared_buffers to 12GB
and beat on the server until every one of them was used by clients
(which is proven by how they've mapped the whole memory set in the
above). It worked fine.

I suspect you're running into some sort of OpenVZ shared memory handling
bug. The way it handles this is one of the more complicated, and
therefore likely to have odd failure cases, part of the design. There's
notes at http://wiki.openvz.org/Postgresql_and_shared_memory about
container-specific things to tune here, so maybe there's just a setting
to tweak you've missed so far. I'm guessing you already went through
that though.

A quick look around shows there are far more regularly reported bugs
like this in OpenVZ than there are in PostgreSQL, and Ubuntu is not
known for bug-free release practices either. You're probably chasing
after the wrong thing trying to find a database problem here. Likely to
end up in the same situation as the last one of these I remember:

http://archives.postgresql.org/pgsql-general/2009-10/msg00125.php
http://lists.debian.org/debian-kernel/2010/03/msg00401.html

...waiting for the OpenVZ problem that's the real cause to get fixed and
make its way to your distribution.

In your situation, I'd just use the smaller setting to avoid the known
problem, and try to focus my energy on finding a platform that isn't as
risky to deploy on instead. Even if that's not your main deployment
one, just having something on real hardware to compare against would be
extremely valuable for isolating the problem here.

(And that's without even considering that setting shared_buffers so high
on Linux is more likely to slow the server than speed it up, which you
said you didn't want to discuss. Just pointing it out so no one else
gets the wrong idea from your configuration.)

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2011-06-18 20:18:42 Re: BUG #6067: In PL/pgsql, EXISTS(SELECT ... INTO...) fails
Previous Message Marinos Yannikos 2011-06-18 02:55:59 Re: Ident authentication fails due to bind error on server (8.4.8)