Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] New wal_sync_method for Darwin?

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Chris Campbell <chris(at)bignerdranch(dot)com>
Subject: Re: [HACKERS] New wal_sync_method for Darwin?
Date: 2005-05-20 14:37:07
Message-ID: 200505201437.j4KEb7822676@candle.pha.pa.us (view raw or flat)
Thread:
Lists: pgsql-patches
Applied.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> 
> Attached is an updated version of your Darwin Fsync-writethrough patch. 
> I did much more restructuring because we do fsync in more places than
> just WAL (like at checkpoints), and we should follow the WAL
> write-through in those places too if that method is used for fsync of
> WAL, rather than the normal fsync without the force-writethrough.
> 
> I restructured the code so now pg_fsync does writethrough or not based
> on the wal setting.  My feeling is you need WAL writethrough, you need
> it for other fsyncs as well.  Write-through was added in an 8.0.X
> subrelease, and because it was a subrelease, the patch did a minimal
> amount of code change.  This change is the restructuring that makes
> sense for 8.1.
> 
> ---------------------------------------------------------------------------
> 
> Chris Campbell wrote:
> > Thanks, Bruce! Here's the patch (would you rather patches be attached 
> > files, or inline?).
> > 
> > Some explanation. In the current code, there is no "fsync" on Windows, 
> > only "fsync_writethrough". But, internally, it behaves identically to 
> > "fsync". The only special case was for the GUC strings.
> > 
> > The patch changes it thusly: since "fsync" and "fsync_writethrough" 
> > cannot be identical on Darwin, I added a new constant for 
> > SYNC_METHOD_FSYNC_WRITE_THROUGH (3). A GUC value of 
> > "fsync_writethrough" sets fsync_method to this new constant.
> > 
> > On Windows, the fsync_writethrough case falls through to the fsync case 
> > (which simply runs fsync). On Darwin, the fsync_writethrough case 
> > executes fcntl(openLogFile, F_FULLFSYNC, 0), which does everything that 
> > fsync does, and then flushes the disk's write cache. The code does not 
> > need to call fsync directly, just fcntl.
> > 
> > The performance of a script that executes 11,000 INSERT statements 
> > (outside a transaction) is pretty slow, since it requires flushing the 
> > disk write cache after EVERY query (transaction commit). I waited about 
> > 15 minutes before I just killed it. But if you do all the inserts 
> > inside a transaction, then I only pay the cache flush price once at 
> > transaction commit, and it's quite usable.
> > 
> > I asked around on IRC, and the only place where flushing the disk cache 
> > is important is when syncing a WAL file. So I didn't change the 
> > behavior of fsync() globally for the code...just for WAL syncs.
> > 
> > I added a new #define called HAVE_FSYNC_WRITE_THROUGH, which is defined 
> > only by the win32.h and darwin.h port headers. This controls the 
> > availability of the "fsync_writethrough" GUC setting. Thanks to the 
> > port headers, no configure test is needed.
> > 
> > Thanks! Let me know what you think of the patch, and tell me what else 
> > I can do.
> > 
> > - Chris
> > 
> > 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

> Index: doc/src/sgml/runtime.sgml
> ===================================================================
> RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
> retrieving revision 1.319
> diff -c -c -r1.319 runtime.sgml
> *** doc/src/sgml/runtime.sgml	15 May 2005 00:26:18 -0000	1.319
> --- doc/src/sgml/runtime.sgml	17 May 2005 22:03:49 -0000
> ***************
> *** 1595,1601 ****
>           values are
>           <literal>fsync</> (call <function>fsync()</> at each commit),
>           <literal>fdatasync</> (call <function>fdatasync()</> at each commit),
> !         <literal>fsync_writethrough</> (call <function>_commit()</> at each commit on Windows),
>           <literal>open_sync</> (write WAL files with <function>open()</> option <symbol>O_SYNC</>), and
>           <literal>open_datasync</> (write WAL files with <function>open()</> option <symbol>O_DSYNC</>).
>           Not all of these choices are available on all platforms.
> --- 1595,1601 ----
>           values are
>           <literal>fsync</> (call <function>fsync()</> at each commit),
>           <literal>fdatasync</> (call <function>fdatasync()</> at each commit),
> !         <literal>fsync_writethrough</> (force write-through of any disk write cache),
>           <literal>open_sync</> (write WAL files with <function>open()</> option <symbol>O_SYNC</>), and
>           <literal>open_datasync</> (write WAL files with <function>open()</> option <symbol>O_DSYNC</>).
>           Not all of these choices are available on all platforms.
> Index: src/backend/access/transam/xlog.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
> retrieving revision 1.191
> diff -c -c -r1.191 xlog.c
> *** src/backend/access/transam/xlog.c	10 May 2005 22:27:29 -0000	1.191
> --- src/backend/access/transam/xlog.c	17 May 2005 22:03:53 -0000
> ***************
> *** 51,61 ****
>    * default method.	We assume that fsync() is always available, and that
>    * configure determined whether fdatasync() is.
>    */
> - #define SYNC_METHOD_FSYNC		0
> - #define SYNC_METHOD_FDATASYNC	1
> - #define SYNC_METHOD_OPEN		2		/* used for both O_SYNC and
> - 										 * O_DSYNC */
> - 
>   #if defined(O_SYNC)
>   #define OPEN_SYNC_FLAG	   O_SYNC
>   #else
> --- 51,56 ----
> ***************
> *** 79,99 ****
>   #define DEFAULT_SYNC_METHOD_STR    "open_datasync"
>   #define DEFAULT_SYNC_METHOD		   SYNC_METHOD_OPEN
>   #define DEFAULT_SYNC_FLAGBIT	   OPEN_DATASYNC_FLAG
> ! #else
> ! #if defined(HAVE_FDATASYNC)
>   #define DEFAULT_SYNC_METHOD_STR   "fdatasync"
>   #define DEFAULT_SYNC_METHOD		  SYNC_METHOD_FDATASYNC
>   #define DEFAULT_SYNC_FLAGBIT	  0
> ! #else
> ! #ifndef FSYNC_IS_WRITE_THROUGH
>   #define DEFAULT_SYNC_METHOD_STR   "fsync"
>   #else
>   #define DEFAULT_SYNC_METHOD_STR   "fsync_writethrough"
> ! #endif
> ! #define DEFAULT_SYNC_METHOD		  SYNC_METHOD_FSYNC
>   #define DEFAULT_SYNC_FLAGBIT	  0
>   #endif
> - #endif
>   
>   
>   /* User-settable parameters */
> --- 74,92 ----
>   #define DEFAULT_SYNC_METHOD_STR    "open_datasync"
>   #define DEFAULT_SYNC_METHOD		   SYNC_METHOD_OPEN
>   #define DEFAULT_SYNC_FLAGBIT	   OPEN_DATASYNC_FLAG
> ! #elif defined(HAVE_FDATASYNC)
>   #define DEFAULT_SYNC_METHOD_STR   "fdatasync"
>   #define DEFAULT_SYNC_METHOD		  SYNC_METHOD_FDATASYNC
>   #define DEFAULT_SYNC_FLAGBIT	  0
> ! #elif !defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
>   #define DEFAULT_SYNC_METHOD_STR   "fsync"
> + #define DEFAULT_SYNC_METHOD		  SYNC_METHOD_FSYNC
> + #define DEFAULT_SYNC_FLAGBIT	  0
>   #else
>   #define DEFAULT_SYNC_METHOD_STR   "fsync_writethrough"
> ! #define DEFAULT_SYNC_METHOD		  SYNC_METHOD_FSYNC_WRITETHROUGH
>   #define DEFAULT_SYNC_FLAGBIT	  0
>   #endif
>   
>   
>   /* User-settable parameters */
> ***************
> *** 122,128 ****
>   
>   
>   /* these are derived from XLOG_sync_method by assign_xlog_sync_method */
> ! static int	sync_method = DEFAULT_SYNC_METHOD;
>   static int	open_sync_bit = DEFAULT_SYNC_FLAGBIT;
>   
>   #define XLOG_SYNC_BIT  (enableFsync ? open_sync_bit : 0)
> --- 115,121 ----
>   
>   
>   /* these are derived from XLOG_sync_method by assign_xlog_sync_method */
> ! int	sync_method = DEFAULT_SYNC_METHOD;
>   static int	open_sync_bit = DEFAULT_SYNC_FLAGBIT;
>   
>   #define XLOG_SYNC_BIT  (enableFsync ? open_sync_bit : 0)
> ***************
> *** 5249,5264 ****
>   	int			new_sync_method;
>   	int			new_sync_bit;
>   
> - #ifndef FSYNC_IS_WRITE_THROUGH
>   	if (pg_strcasecmp(method, "fsync") == 0)
> - #else
> - 	/* Win32 fsync() == _commit(), which writes through a write cache */
> - 	if (pg_strcasecmp(method, "fsync_writethrough") == 0)
> - #endif
>   	{
>   		new_sync_method = SYNC_METHOD_FSYNC;
>   		new_sync_bit = 0;
>   	}
>   #ifdef HAVE_FDATASYNC
>   	else if (pg_strcasecmp(method, "fdatasync") == 0)
>   	{
> --- 5242,5259 ----
>   	int			new_sync_method;
>   	int			new_sync_bit;
>   
>   	if (pg_strcasecmp(method, "fsync") == 0)
>   	{
>   		new_sync_method = SYNC_METHOD_FSYNC;
>   		new_sync_bit = 0;
>   	}
> + #ifdef HAVE_FSYNC_WRITETHROUGH
> + 	else if (pg_strcasecmp(method, "fsync_writethrough") == 0)
> + 	{
> + 		new_sync_method = SYNC_METHOD_FSYNC_WRITETHROUGH;
> + 		new_sync_bit = 0;
> + 	}
> + #endif
>   #ifdef HAVE_FDATASYNC
>   	else if (pg_strcasecmp(method, "fdatasync") == 0)
>   	{
> ***************
> *** 5328,5339 ****
>   	switch (sync_method)
>   	{
>   		case SYNC_METHOD_FSYNC:
> ! 			if (pg_fsync(openLogFile) != 0)
>   				ereport(PANIC,
>   						(errcode_for_file_access(),
>   					errmsg("could not fsync log file %u, segment %u: %m",
>   						   openLogId, openLogSeg)));
>   			break;
>   #ifdef HAVE_FDATASYNC
>   		case SYNC_METHOD_FDATASYNC:
>   			if (pg_fdatasync(openLogFile) != 0)
> --- 5323,5343 ----
>   	switch (sync_method)
>   	{
>   		case SYNC_METHOD_FSYNC:
> ! 			if (pg_fsync_no_writethrough(openLogFile) != 0)
>   				ereport(PANIC,
>   						(errcode_for_file_access(),
>   					errmsg("could not fsync log file %u, segment %u: %m",
>   						   openLogId, openLogSeg)));
>   			break;
> + #ifdef HAVE_FSYNC_WRITETHROUGH
> + 		case SYNC_METHOD_FSYNC_WRITETHROUGH:
> + 			if (pg_fsync_writethrough(openLogFile) != 0)
> + 				ereport(PANIC,
> + 						(errcode_for_file_access(),
> + 					errmsg("could not fsync write-through log file %u, segment %u: %m",
> + 						   openLogId, openLogSeg)));
> + 			break;
> + #endif
>   #ifdef HAVE_FDATASYNC
>   		case SYNC_METHOD_FDATASYNC:
>   			if (pg_fdatasync(openLogFile) != 0)
> Index: src/backend/storage/file/fd.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/storage/file/fd.c,v
> retrieving revision 1.115
> diff -c -c -r1.115 fd.c
> *** src/backend/storage/file/fd.c	31 Dec 2004 22:00:51 -0000	1.115
> --- src/backend/storage/file/fd.c	17 May 2005 22:03:54 -0000
> ***************
> *** 232,242 ****
>   
>   
>   /*
> !  * pg_fsync --- same as fsync except does nothing if enableFsync is off
>    */
>   int
>   pg_fsync(int fd)
>   {
>   	if (enableFsync)
>   		return fsync(fd);
>   	else
> --- 232,258 ----
>   
>   
>   /*
> !  * pg_fsync --- do fsync with or without writethrough
>    */
>   int
>   pg_fsync(int fd)
>   {
> + #ifndef HAVE_FSYNC_WRITETHROUGH_ONLY
> + 	if (sync_method != SYNC_METHOD_FSYNC_WRITETHROUGH)
> + 		return pg_fsync_no_writethrough(fd);
> + 	else
> + #endif
> + 		return pg_fsync_writethrough(fd);
> + }
> + 
> + 
> + /*
> +  * pg_fsync_no_writethrough --- same as fsync except does nothing if
> +  *	enableFsync is off
> +  */
> + int
> + pg_fsync_no_writethrough(int fd)
> + {
>   	if (enableFsync)
>   		return fsync(fd);
>   	else
> ***************
> *** 244,249 ****
> --- 260,283 ----
>   }
>   
>   /*
> +  * pg_fsync_writethrough
> +  */
> + int
> + pg_fsync_writethrough(int fd)
> + {
> + 	if (enableFsync)
> + #ifdef WIN32
> + 		return _commit(fd);
> + #elif defined(__darwin__)
> + 		return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0;
> + #else
> + 		return -1;
> + #endif
> + 	else
> + 		return 0;
> + }
> + 
> + /*
>    * pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off
>    *
>    * Not all platforms have fdatasync; treat as fsync if not available.
> Index: src/include/access/xlog.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/access/xlog.h,v
> retrieving revision 1.60
> diff -c -c -r1.60 xlog.h
> *** src/include/access/xlog.h	28 Apr 2005 21:47:17 -0000	1.60
> --- src/include/access/xlog.h	17 May 2005 22:03:56 -0000
> ***************
> *** 75,80 ****
> --- 75,87 ----
>    */
>   #define XLOG_NO_TRAN			XLR_INFO_MASK
>   
> + /* Sync methods */
> + #define SYNC_METHOD_FSYNC		0
> + #define SYNC_METHOD_FDATASYNC	1
> + #define SYNC_METHOD_OPEN		2			/* for O_SYNC and O_DSYNC */
> + #define SYNC_METHOD_FSYNC_WRITETHROUGH	3
> + extern int	sync_method;
> + 
>   /*
>    * List of these structs is used to pass data to XLogInsert().
>    *
> Index: src/include/port/darwin.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/port/darwin.h,v
> retrieving revision 1.6
> diff -c -c -r1.6 darwin.h
> *** src/include/port/darwin.h	23 Dec 2003 03:31:30 -0000	1.6
> --- src/include/port/darwin.h	17 May 2005 22:03:57 -0000
> ***************
> *** 1 ****
> --- 1,4 ----
>   #define __darwin__	1
> + 
> + #define HAVE_FSYNC_WRITETHROUGH
> + 
> Index: src/include/port/win32.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/port/win32.h,v
> retrieving revision 1.44
> diff -c -c -r1.44 win32.h
> *** src/include/port/win32.h	24 Mar 2005 04:36:19 -0000	1.44
> --- src/include/port/win32.h	17 May 2005 22:03:57 -0000
> ***************
> *** 16,23 ****
>   #define mkdir(a,b)	mkdir(a)
>   
>   
> ! #define fsync(a)	_commit(a)
> ! #define FSYNC_IS_WRITE_THROUGH
>   #define ftruncate(a,b)	chsize(a,b)
>   
>   #define USES_WINSOCK
> --- 16,23 ----
>   #define mkdir(a,b)	mkdir(a)
>   
>   
> ! #define HAVE_FSYNC_WRITETHROUGH
> ! #define HAVE_FSYNC_WRITETHROUGH_ONLY
>   #define ftruncate(a,b)	chsize(a,b)
>   
>   #define USES_WINSOCK
> Index: src/include/storage/fd.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/storage/fd.h,v
> retrieving revision 1.50
> diff -c -c -r1.50 fd.h
> *** src/include/storage/fd.h	31 Dec 2004 22:03:42 -0000	1.50
> --- src/include/storage/fd.h	17 May 2005 22:03:57 -0000
> ***************
> *** 89,94 ****
> --- 89,96 ----
>   							  SubTransactionId parentSubid);
>   extern void RemovePgTempFiles(void);
>   extern int	pg_fsync(int fd);
> + extern int	pg_fsync_no_writethrough(int fd);
> + extern int	pg_fsync_writethrough(int fd);
>   extern int	pg_fdatasync(int fd);
>   
>   /* Filename components for OpenTemporaryFile */

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>       message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

In response to

pgsql-patches by date

Next:From: Jim C. NasbyDate: 2005-05-20 16:49:20
Subject: Re: numeric precision when raising one numeric to another.
Previous:From: Bruce MomjianDate: 2005-05-20 14:29:10
Subject: Re: Exec statement logging

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group