Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org, Michael Clemmons <glassresistor(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Subject: Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Date: 2010-02-06 05:03:30
Message-ID: 4B6CF822.9010608@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Andres Freund wrote:
> On 02/03/10 14:42, Robert Haas wrote:
>> Well, maybe we should start with a discussion of what kernel calls
>> you're aware of on different platforms and then we could try to put an
>> API around it.
> In linux there is sync_file_range. On newer Posixish systems one can 
> emulate that with mmap() and msync() (in batches obviously).
>
> No idea about windows.

There's a series of parameters you can pass into CreateFile:  
http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx

A lot of these are already mapped inside of src/port/open.c in a pretty 
straightforward way from the POSIX-oriented interface:

O_RDWR,O_WRONLY -> GENERIC_WRITE, GENERIC_READ
O_RANDOM -> FILE_FLAG_RANDOM_ACCESS
O_SEQUENTIAL -> FILE_FLAG_SEQUENTIAL_SCAN
O_SHORT_LIVED -> FILE_ATTRIBUTE_TEMPORARY
O_TEMPORARY -> FILE_FLAG_DELETE_ON_CLOSE
O_DIRECT -> FILE_FLAG_NO_BUFFERING
O_DSYNC -> FILE_FLAG_WRITE_THROUGH

You have to read the whole "Caching Behavior" section to see exactly how 
all of those interact, and even then notes like 
http://support.microsoft.com/kb/99794 are needed to follow the fine 
points of things like FILE_FLAG_NO_BUFFERING vs. FILE_FLAG_WRITE_THROUGH.

So anything that's setting those POSIX open flags better than before is 
getting the benefit of that improvement on Windows, too.  But that's not 
quite the same as the changes using fadvise to provide better targeted 
cache control hints.

I'm getting the impression that doing much better on Windows might fall 
into the same sort of category as Solaris, where the primary interface 
for this sort of thing is to use an AIO implementation instead:  
http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx

The effective_io_concurrency feature had proof of concept test programs 
that worked using AIO, but actually following through on that 
implementation would require a major restructuring of how the database 
interacts with the OS in terms of reads and writes of blocks.  It looks 
to me like doing something similar to sync_file_range on Windows would 
be similarly difficult.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com   www.2ndQuadrant.us


In response to

Responses

pgsql-performance by date

Next:From: Robert HaasDate: 2010-02-06 06:31:01
Subject: Re: Slow query: table iteration (8.3)
Previous:From: Glenn MaynardDate: 2010-02-06 01:35:39
Subject: Re: Slow query: table iteration (8.3)

pgsql-hackers by date

Next:From: Tom LaneDate: 2010-02-06 06:20:53
Subject: Re: Reading deleted records - PageHeader v3
Previous:From: James William PyeDate: 2010-02-06 02:18:58
Subject: Re: Confusion over Python drivers

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group