Skip site navigation (1) Skip section navigation (2)

Race-condition with failed block-write?

From: Arjen van der Meijden <acm(at)tweakers(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Race-condition with failed block-write?
Date: 2005-09-09 08:52:47
Message-ID: 43214D5F.9080502@tweakers.net (view raw or flat)
Thread:
Lists: pgsql-bugs
Hi list,

I noticed a development machine was responding terribly slow, and found 
out postgresql had consumed all available memory:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres  7465  0.0  0.0  16636   928 ?        S    Sep05   0:00 
/usr/bin/postmaster -D /var/lib/postgresql/data
postgres  7472  0.0 81.0 3144408 832032 ?      D    Sep05   0:56  \_ 
postgres: writer process
postgres  7473  0.0  0.0   7552   916 ?        S    Sep05   0:00  \_ 
postgres: stats buffer process
postgres  7474  0.0  0.0   6848   900 ?        S    Sep05   0:00      \_ 
postgres: stats collector process

It had taken over 800MB of memory. The machine is a P4 3.0 Ghz (with HT 
enabled), 1GB of memory a few SATA disks and runs with a recent Gentoo + 
2.6.12.5 kernel.
The postgresql version that failed was 8.0.3, it may or may not be worth 
knowing that it runs a 8.1devel on another port and from another directory.

The 8.0.3-database was not actively in use, since our current 
development work was using the 8.1devel.

In the postgresql.log a write-failure messages was repeated enough to 
make the log file 50MB larger:

This was the first one:
[ - 2005-09-07 13:03:47 CEST @] ERROR:  xlog flush request 29/67713428 
is not satisfied --- flushed only to 29/2E73E488
[ - 2005-09-07 13:03:47 CEST @] CONTEXT:  writing block 21 of relation 
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] ERROR:  xlog flush request 29/67713428 
is not satisfied --- flushed only to 29/2E73E488
[ - 2005-09-07 13:03:48 CEST @] CONTEXT:  writing block 21 of relation 
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] WARNING:  could not write block 21 of 
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] DETAIL:  Multiple failures --- write 
error may be permanent.


And those 4 lines were repeated up untill the memory was empty, as it 
seems from the log:


TopMemoryContext: -1095880208 total in 264213 blocks; 537938888 free 
(924739 chunks); -1633819096 used
MdSmgr: 8192 total in 1 blocks; 8024 free (0 chunks); 168 used
Pending Ops Table: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
DynaHash: 8192 total in 1 blocks; 7488 free (0 chunks); 704 used
smgr relation table: 8192 total in 1 blocks; 4048 free (0 chunks); 4144 used
LockTable (locallock hash): 8192 total in 1 blocks; 6112 free (0 
chunks); 2080 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
[ - 2005-09-09 02:42:22 CEST @] ERROR:  out of memory
[ - 2005-09-09 02:42:22 CEST @] DETAIL:  Failed on request of size 16000.


After issueing an immediate-shutdown, it starts and runs just fine:

[ - 2005-09-09 10:19:51 CEST @] LOG:  received fast shutdown request
[ - 2005-09-09 10:23:01 CEST @] LOG:  received immediate shutdown request
[ - 2005-09-09 10:27:20 CEST @] LOG:  database system was interrupted at 
2005-09-05 11:24:08 CEST
[ - 2005-09-09 10:27:20 CEST @] LOG:  checkpoint record is at 29/2E73E44C
[ - 2005-09-09 10:27:20 CEST @] LOG:  redo record is at 29/2E73E44C; 
undo record is at 0/0; shutdown TRUE
[ - 2005-09-09 10:27:20 CEST @] LOG:  next transaction ID: 166361; next 
OID: 55768251
[ - 2005-09-09 10:27:20 CEST @] LOG:  database system was not properly 
shut down; automatic recovery in progress
[ - 2005-09-09 10:27:20 CEST @] LOG:  record with zero length at 29/2E73E488
[ - 2005-09-09 10:27:20 CEST @] LOG:  redo is not required
[ - 2005-09-09 10:27:20 CEST @] LOG:  database system is ready


At 2005-09-05 the machine was rebooted and the only query since then 
(which may have triggered the issue) was:
[2005-09-07 13:03:35 CEST - 2005-09-07 13:03:37 CEST acm(at)acm] LOG: 
statement: SELECT c.oid,
           n.nspname,
           c.relname
         FROM pg_catalog.pg_class c
              LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
         WHERE pg_catalog.pg_table_is_visible(c.oid)
               AND c.relname ~ '^pwprijs$'
         ORDER BY 2, 3;

I shut down the postgresql-server after the restart again to prevent 
further changes to the data-files of it.

Best regards,

Arjen van der Meijden

Responses

pgsql-bugs by date

Next:From: anrisDate: 2005-09-09 12:00:31
Subject: BUG #1871: operations with data types
Previous:From: Sijin MSDate: 2005-09-09 07:54:05
Subject: BUG #1870: Insertion problem

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group