Re: Next steps in debugging database storage problems?

From: Jacob Bunk Nielsen <jacob(at)bunk(dot)cc>
To: Jacob Bunk Nielsen <jacob(at)bunk(dot)cc>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Next steps in debugging database storage problems?
Date: 2014-08-15 07:23:23
Message-ID: spamdrop+87sikyb4ic.fsf@atom.bunk.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi

On the 1st of July 2014 Jacob Bunk Nielsen <jacob(at)bunk(dot)cc> wrote:

> We have a PostgreSQL 9.3.4 running in an LXC container on Debian
> Wheezy on a Linux 3.10.43 kernel on a Dell R620 server. Data are
> stored on a XFS file system. We are seeing problems such as:
>
> unexpected data beyond EOF in block 2 of relation base/805208133/1238511128
>
> and
>
> could not read block 5 in file "base/805208348/1259338118": read only
> 0 of 8192 bytes
>
> This seems to occur every few days after the server has been up for
> 30-40 days. If we reboot the server it'll be another 30-40 days before
> we see any problems again.
>
> The server has been running fine on a Dell R710 for a long time, and was
> upgraded to a Dell R620 last year, when the problems started. We have
> tried switching to a different Dell R620, but that did not make a
> difference. We've seen this with kernels 3.2, 3.4 and 3.10.

This time it took 45 days before this happened:

LOG: unexpected EOF on standby connection
ERROR: unexpected data beyond EOF in block 140 of relation base/805208885/805209852
HINT: This has been seen to occur with buggy kernels; consider updating your system.

It always happens with small tables with lots of inserts and deletes.
From previous experience we know that it's now going to happen again in
a few days, so we'll probably try to schedule a reboot to give us
another 30-40 days.

Is anyone else seeing problems with PostgreSQL on XFS filesystems?

Any hints on how to debug what goes wrong here would be still be greatly
appreciated.

> We have multiple other PostgreSQL servers running in a similar setup
> without causing any problems, but this server is probably the busiest of
> our PostgreSQL servers.

This is still the case.

Best regards

Jacob

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message FarjadFarid(ChkNet) 2014-08-15 14:23:18 list of index
Previous Message Joseph Kregloh 2014-08-14 21:08:00 Re: Best practices for cloning DB servers