pgstattuple triggered checkpoint failure and database outage?

From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: pgstattuple triggered checkpoint failure and database outage?
Date: 2009-03-30 06:42:05
Message-ID: 6bc73d4c0903292342u3c18acfu25c21baeafe140be@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I just had a brief outage on a production server running 8.3.6, which
I suspect was triggered by me running a table bloat report making lots
of pgstattuple calls.

The first I got of it was the script I'd just kicked off died:

could not open segment 1 of relation 1663/16409/11088101 (target block
131292): No such file or directory
CONTEXT: writing block 131292 of relation 1663/16409/11088101

More alerts came in - looks like everything was failing with similar errors.

Checking the logs the first indication of the problem is:

<@:6160> 2009-03-30 06:49:27 BST LOG: checkpoint starting: time
[...]
<@:6160> 2009-03-30 06:49:58 BST ERROR: could not open segment 1 of
relation 1663/16409/11088101 (target block 131072): No such file or
directory
<@:6160> 2009-03-30 06:49:58 BST CONTEXT: writing block 131072 of
relation 1663/16409/11088101
<@:6160> 2009-03-30 06:49:59 BST LOG: checkpoint starting: time
<@:6160> 2009-03-30 06:49:59 BST ERROR: could not open segment 1 of
relation 1663/16409/11088101 (target block 134984): No such file or
directory
<@:6160> 2009-03-30 06:49:59 BST CONTEXT: writing block 134984 of
relation 1663/16409/11088101
<@:6160> 2009-03-30 06:50:00 BST LOG: checkpoint starting: time
<@:6160> 2009-03-30 06:50:01 BST ERROR: could not open segment 1 of
relation 1663/16409/11088101 (target block 135061): No such file or
directory
<@:6160> 2009-03-30 06:50:01 BST CONTEXT: writing block 135061 of
relation 1663/16409/11088101

Doing an immediate shutdown and restart seems to have brought
everything back online. I don't think there is any corruption (not
that I can tell easily...), and I'm not worried if I lost a
transaction or three.

Can anyone think what happened here? I suspect pgstattuple as it was
the only unusual activity happening at that time and as far as I'm
aware we have no hardware alerts and the box has been running smoothly
for quite some time.

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2009-03-30 06:56:12 Re: New shapshot RPMs (Mar 27, 2009) are ready for testing
Previous Message aravind chandu 2009-03-30 06:20:48 Parallel DB architechture

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Kellerer 2009-03-30 06:56:12 Re: New shapshot RPMs (Mar 27, 2009) are ready for testing
Previous Message Abhijit Menon-Sen 2009-03-30 06:28:33 [PATCH] Implement (and document, and test) has_sequence_privilege()