Skip site navigation (1) Skip section navigation (2)

Re: Race-condition with failed block-write?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Arjen van der Meijden <acm(at)tweakers(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Race-condition with failed block-write?
Date: 2005-09-12 22:26:07
Message-ID: 6160.1126563967@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-bugs
Arjen van der Meijden <acm(at)tweakers(dot)net> writes:
> It had taken over 800MB of memory. The machine is a P4 3.0 Ghz (with HT 
> enabled), 1GB of memory a few SATA disks and runs with a recent Gentoo + 
> 2.6.12.5 kernel.
> The postgresql version that failed was 8.0.3, it may or may not be worth 
> knowing that it runs a 8.1devel on another port and from another directory.

> In the postgresql.log a write-failure messages was repeated enough to 
> make the log file 50MB larger:

I was able to replicate the memory leak in 8.0 by artificially poking a
bad value into the LSN field of a dirty page.  CVS tip did not seem to
leak anything under the same conditions --- probably the difference is
that BufferSync() uses palloc'd temporary space in 8.0 but not in HEAD.
Nonetheless, we ought to have a positive defense against such leaks in
future, so I've applied the attached patch to both branches.

There is still the question of how the LSN value got corrupted, but
we've probably missed any chance of finding that out now.

			regards, tom lane


Index: bgwriter.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/bgwriter.c,v
retrieving revision 1.19
diff -c -r1.19 bgwriter.c
*** bgwriter.c	20 Aug 2005 23:26:17 -0000	1.19
--- bgwriter.c	12 Sep 2005 21:42:48 -0000
***************
*** 160,165 ****
--- 160,166 ----
  BackgroundWriterMain(void)
  {
  	sigjmp_buf	local_sigjmp_buf;
+ 	MemoryContext bgwriter_context;
  
  	Assert(BgWriterShmem != NULL);
  	BgWriterShmem->bgwriter_pid = MyProcPid;
***************
*** 208,213 ****
--- 209,227 ----
  	last_checkpoint_time = time(NULL);
  
  	/*
+ 	 * Create a memory context that we will do all our work in.  We do this
+ 	 * so that we can reset the context during error recovery and thereby
+ 	 * avoid possible memory leaks.  Formerly this code just ran in
+ 	 * TopMemoryContext, but resetting that would be a really bad idea.
+ 	 */
+ 	bgwriter_context = AllocSetContextCreate(TopMemoryContext,
+ 											 "Background Writer",
+ 											 ALLOCSET_DEFAULT_MINSIZE,
+ 											 ALLOCSET_DEFAULT_INITSIZE,
+ 											 ALLOCSET_DEFAULT_MAXSIZE);
+ 	MemoryContextSwitchTo(bgwriter_context);
+ 
+ 	/*
  	 * If an exception is encountered, processing resumes here.
  	 *
  	 * See notes in postgres.c about the design of this coding.
***************
*** 247,255 ****
  		 * Now return to normal top-level context and clear ErrorContext
  		 * for next time.
  		 */
! 		MemoryContextSwitchTo(TopMemoryContext);
  		FlushErrorState();
  
  		/* Now we can allow interrupts again */
  		RESUME_INTERRUPTS();
  
--- 261,272 ----
  		 * Now return to normal top-level context and clear ErrorContext
  		 * for next time.
  		 */
! 		MemoryContextSwitchTo(bgwriter_context);
  		FlushErrorState();
  
+ 		/* Flush any leaked data in the top-level context */
+ 		MemoryContextResetAndDeleteChildren(bgwriter_context);
+ 
  		/* Now we can allow interrupts again */
  		RESUME_INTERRUPTS();
  

In response to

Responses

pgsql-bugs by date

Next:From: AlexeiDate: 2005-09-12 23:44:10
Subject: BUG #1878: Different execution plans for the same query.
Previous:From: Oliver JowettDate: 2005-09-12 22:06:25
Subject: Re: Strange behavoir of batches

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group