Re: Various intermittent bugs/instability - how to debug?

From: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
To: Frederik Ramm <frederik(dot)ramm(at)geofabrik(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Various intermittent bugs/instability - how to debug?
Date: 2008-09-10 08:43:10
Message-ID: 48C7889E.9010905@siriusit.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Frederik Ramm wrote:

> Dear
> PostgreSQL community,
>
> I hope you can help me with a problem I'm having - I'm stuck and
> don't know how to debug this further.
>
> I have a rather large nightly process that imports a lot of data from
> the OpenStreetMap project into a PostGIS database, then proceeds doing
> all sorts of things - creating spatial indexes, computing bounding
> boxes, doing simplification of geometries, that kind of stuff. The whole
> job usually takes about five hours.
>
> I'm running this on a Quad-Core Linux (Ubuntu, PostgreSQL 8.3) machine
> with 8 GB RAM.
>
> Every other night, the process aborts with some strange error message,
> and never at the same position:
>
> ERROR: invalid page header in block 166406 of relation "node_tags"
>
> ERROR: could not open segment 2 of relation 1663/24253056/24253895
> (target block 1421295656): No such file or directory
>
> ERROR: Unknown geometry type: 10
>
> When I continue the process after the failure, it will usually work.
>
> I know you all think "hardware problem" now. Of course this was my first
> guess as well. I ran a memory test for a night, no results; I downgraded
> do "failsafe defaults" for all BIOS timings, again no change. Ran
> "cpuburn" and all sorts of other things to grill the hardware - nothing.
>
> Then I bought an entirely new machine; similar setup, but using a
> Gigabyte instead of Asus mainboard, different chipset, slightly faster
> Quad-Core processor, and again 8 GB RAM and Ubuntu "Hardy" with
> PostgresSQL 8.3 and matching PostGIS.
>
> Believe it or not, this machine shows the *same* problems. It is not
> 100% reproducible, sometimes the job works fully, but every other day it
> just breaks down with one of the funny messages like above. No memtest
> errors here either.
>
> Both machines are "consumer" quality, i.e. normal Intel processors and
> not the "server" (Xeon) stock.
>
> I am at a loss - how can I proceed? This looks like a hardware problem
> alright, but so simliar problems on two so different machines? Is there
> something wrong with Intel's Quad-Core CPUs?
>
> What could I do to have a better chance of reproducing the error and
> ultimately identifying the component responsible? Is there some kind of
> "PostgresSQL load test", something like "cpuburn" for PostgreSQL?
>
> Have there been other reports of intermittent problems like mine, and
> does anybody have any blind guesses...?
>
> Thanks
> Frederik

Hi Frederik,

We did find a memory clobber in the PostGIS ANALYZE routine a while
back, but the fix hasn't yet made it into a release.

If you are building from source, please can you try applying the patch
here: http://code.google.com/p/postgis/issues/detail?id=43 and reporting
back whether it helps or not?

ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Simon Riggs 2008-09-10 09:20:32 Re: PostgreSQL TPC-H test result?
Previous Message Artis Caune 2008-09-10 06:57:14 Re: Server installation problem using freebsd ports