Re: Various intermittent bugs/instability - how to debug?

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Frederik Ramm" <frederik(dot)ramm(at)geofabrik(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Various intermittent bugs/instability - how to debug?
Date: 2008-09-09 16:03:58
Message-ID: dcc563d10809090903xe835ebk890b9a9b306e7571@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Sep 9, 2008 at 8:50 AM, Frederik Ramm
<frederik(dot)ramm(at)geofabrik(dot)de> wrote:
> Dear PostgreSQL community,
>
> I hope you can help me with a problem I'm having - I'm stuck and don't
> know how to debug this further.
>
> I have a rather large nightly process that imports a lot of data from the
> OpenStreetMap project into a PostGIS database, then proceeds doing all sorts
> of things - creating spatial indexes, computing bounding boxes, doing
> simplification of geometries, that kind of stuff. The whole job usually
> takes about five hours.
>
> I'm running this on a Quad-Core Linux (Ubuntu, PostgreSQL 8.3) machine with
> 8 GB RAM.
>
> Every other night, the process aborts with some strange error message, and
> never at the same position:
>
> ERROR: invalid page header in block 166406 of relation "node_tags"
>
> ERROR: could not open segment 2 of relation 1663/24253056/24253895 (target
> block 1421295656): No such file or directory
>
> ERROR: Unknown geometry type: 10
>
> When I continue the process after the failure, it will usually work.
>
> I know you all think "hardware problem" now. Of course this was my first
> guess as well. I ran a memory test for a night, no results; I downgraded do
> "failsafe defaults" for all BIOS timings, again no change. Ran "cpuburn" and
> all sorts of other things to grill the hardware - nothing.

You definitely are suffering from db corruption, and given the number
and differing type of errors, it would seem unlikely that pgsql has a
load of bugs only you are seeing. OTOH, if the bug is hidden deep in
postgis or something, then who knows...

I'd definitely run something like bonnie++ for a few days and see if
it gets HD errors or not.

And definitely run memtest86 for a day or so and make sure you're not
getting any errors there.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2008-09-09 16:07:32 Re: PostgreSQL process architecture question.
Previous Message Lennin Caro 2008-09-09 16:01:13 Re: 3 postgres processes