Race-condition with failed block-write?

From: Arjen van der Meijden <acm(at)tweakers(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Race-condition with failed block-write?
Date: 2005-09-09 08:52:47
Message-ID: 43214D5F.9080502@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi list,

I noticed a development machine was responding terribly slow, and found
out postgresql had consumed all available memory:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 7465 0.0 0.0 16636 928 ? S Sep05 0:00
/usr/bin/postmaster -D /var/lib/postgresql/data
postgres 7472 0.0 81.0 3144408 832032 ? D Sep05 0:56 \_
postgres: writer process
postgres 7473 0.0 0.0 7552 916 ? S Sep05 0:00 \_
postgres: stats buffer process
postgres 7474 0.0 0.0 6848 900 ? S Sep05 0:00 \_
postgres: stats collector process

It had taken over 800MB of memory. The machine is a P4 3.0 Ghz (with HT
enabled), 1GB of memory a few SATA disks and runs with a recent Gentoo +
2.6.12.5 kernel.
The postgresql version that failed was 8.0.3, it may or may not be worth
knowing that it runs a 8.1devel on another port and from another directory.

The 8.0.3-database was not actively in use, since our current
development work was using the 8.1devel.

In the postgresql.log a write-failure messages was repeated enough to
make the log file 50MB larger:

This was the first one:
[ - 2005-09-07 13:03:47 CEST @] ERROR: xlog flush request 29/67713428
is not satisfied --- flushed only to 29/2E73E488
[ - 2005-09-07 13:03:47 CEST @] CONTEXT: writing block 21 of relation
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] ERROR: xlog flush request 29/67713428
is not satisfied --- flushed only to 29/2E73E488
[ - 2005-09-07 13:03:48 CEST @] CONTEXT: writing block 21 of relation
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] WARNING: could not write block 21 of
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] DETAIL: Multiple failures --- write
error may be permanent.

And those 4 lines were repeated up untill the memory was empty, as it
seems from the log:

TopMemoryContext: -1095880208 total in 264213 blocks; 537938888 free
(924739 chunks); -1633819096 used
MdSmgr: 8192 total in 1 blocks; 8024 free (0 chunks); 168 used
Pending Ops Table: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
DynaHash: 8192 total in 1 blocks; 7488 free (0 chunks); 704 used
smgr relation table: 8192 total in 1 blocks; 4048 free (0 chunks); 4144 used
LockTable (locallock hash): 8192 total in 1 blocks; 6112 free (0
chunks); 2080 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
[ - 2005-09-09 02:42:22 CEST @] ERROR: out of memory
[ - 2005-09-09 02:42:22 CEST @] DETAIL: Failed on request of size 16000.

After issueing an immediate-shutdown, it starts and runs just fine:

[ - 2005-09-09 10:19:51 CEST @] LOG: received fast shutdown request
[ - 2005-09-09 10:23:01 CEST @] LOG: received immediate shutdown request
[ - 2005-09-09 10:27:20 CEST @] LOG: database system was interrupted at
2005-09-05 11:24:08 CEST
[ - 2005-09-09 10:27:20 CEST @] LOG: checkpoint record is at 29/2E73E44C
[ - 2005-09-09 10:27:20 CEST @] LOG: redo record is at 29/2E73E44C;
undo record is at 0/0; shutdown TRUE
[ - 2005-09-09 10:27:20 CEST @] LOG: next transaction ID: 166361; next
OID: 55768251
[ - 2005-09-09 10:27:20 CEST @] LOG: database system was not properly
shut down; automatic recovery in progress
[ - 2005-09-09 10:27:20 CEST @] LOG: record with zero length at 29/2E73E488
[ - 2005-09-09 10:27:20 CEST @] LOG: redo is not required
[ - 2005-09-09 10:27:20 CEST @] LOG: database system is ready

At 2005-09-05 the machine was rebooted and the only query since then
(which may have triggered the issue) was:
[2005-09-07 13:03:35 CEST - 2005-09-07 13:03:37 CEST acm(at)acm] LOG:
statement: SELECT c.oid,
n.nspname,
c.relname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE pg_catalog.pg_table_is_visible(c.oid)
AND c.relname ~ '^pwprijs$'
ORDER BY 2, 3;

I shut down the postgresql-server after the restart again to prevent
further changes to the data-files of it.

Best regards,

Arjen van der Meijden

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message anris 2005-09-09 12:00:31 BUG #1871: operations with data types
Previous Message Sijin MS 2005-09-09 07:54:05 BUG #1870: Insertion problem