Skip site navigation (1) Skip section navigation (2)

Re: Race-condition with failed block-write?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Arjen van der Meijden <acm(at)tweakers(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Race-condition with failed block-write?
Date: 2005-09-11 00:18:26
Message-ID: 21541.1126397906@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-bugs
Arjen van der Meijden <acm(at)tweakers(dot)net> writes:
> In the postgresql.log a write-failure messages was repeated enough to 
> make the log file 50MB larger:

> [ - 2005-09-07 13:03:47 CEST @] ERROR:  xlog flush request 29/67713428 
> is not satisfied --- flushed only to 29/2E73E488
> [ - 2005-09-07 13:03:47 CEST @] CONTEXT:  writing block 21 of relation 
> 1663/2013826/9975789
> ...

> TopMemoryContext: -1095880208 total in 264213 blocks; 537938888 free 
> (924739 chunks); -1633819096 used
> MdSmgr: 8192 total in 1 blocks; 8024 free (0 chunks); 168 used
> Pending Ops Table: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
> DynaHash: 8192 total in 1 blocks; 7488 free (0 chunks); 704 used
> smgr relation table: 8192 total in 1 blocks; 4048 free (0 chunks); 4144 used
> LockTable (locallock hash): 8192 total in 1 blocks; 6112 free (0 
> chunks); 2080 used
> ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
> [ - 2005-09-09 02:42:22 CEST @] ERROR:  out of memory
> [ - 2005-09-09 02:42:22 CEST @] DETAIL:  Failed on request of size 16000.

The pending-ops table only exists in the bgwriter, so it's evidently the
bgwriter that was out of memory.  I have an old note to myself

: Doesn't bgwriter risk leaking memory if ereport out of a checkpoint?
: Seems we should have it run checkpoints in a short-term context.
: Don't the other daemons have similar issues?

It looks to me like you have a case of this actually happening: it kept
failing to execute a checkpoint and leaking some more memory each time.

I'll move the priority of fixing that up a bit ...

The other question is why the failure was occurring in the first place
--- corrupt LSN value, apparently, but how'd it get that way?

			regards, tom lane

In response to

pgsql-bugs by date

Next:From: Michael FuhrDate: 2005-09-11 01:45:01
Subject: Re: BUG #1871: operations with data types
Previous:From: Tom LaneDate: 2005-09-10 23:33:59
Subject: Re: BUG #1865: isinf wrongly dectected under Solaris 9

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group