Re: O_DIRECT support for Windows

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Takayuki Tsunakawa <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: O_DIRECT support for Windows
Date: 2007-01-16 09:09:46
Message-ID: 20070116090946.GA1564@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Tue, Jan 16, 2007 at 10:59:11AM +0900, Takayuki Tsunakawa wrote:
> From: "Magnus Hagander" <magnus(at)hagander(dot)net>
> > ITAGAKI Takahiro wrote:
> >> Do you mean there are drives that have larger sector size than 8kB?
> >> We've already put the xlog buffer along the alignment of
> >> ALIGNOF_XLOG_BUFFER (typically 8192 bytes).
> >> But if there are such drives, using FILE_FLAG_NO_BUFFERING is
> harmful!
> >
> > Yes. I have heard this can happen with certain SAN drives. I haven't
> > seen it myself, and I can't seem to find a reference right now :-)
> But I
> > do recall having read about th need to check the sector size and
> > specifically align it, because some do have that problem.
>
> I think many people can benefit from Itagaki-san's proposal, and
> NO_BUFFERING should be default. Isn't it very rare that disks with
> sector size larger than 8KB are used?

Definitly very rare.

> Providing a way (such as
> wal_sync_method) to avoid NO_BUFFERING is sufficient for people in
> rare environments. Or, by determining the sector size with
> GetDiskFreeSpaceEx(), we could auto-switch to not using NO_BUFFERING
> when the sector size is larger than 8KB.

I think the second one is better.

> I wonder whether GetDiskFreeSpaceEx() tells us the right sector size
> configured by SAN tools.

It should. If it doesn't, then there are likely to be other issues.

> And I wonder if Microsoft assumes a sector size larger than 4KB and
> NTFS works. The following paragraph appears in the CreateFile page:
>
> One way to align buffers on integer multiples of the volume sector
> size is to use VirtualAlloc to allocate the buffers. It allocates
> memory that is aligned on addresses that are integer multiples of the
> operating system's memory page size. Because both memory page and
> volume sector sizes are powers of 2, this memory is also aligned on
> addresses that are integer multiples of a volume sector size. Memory
> pages are 4-8 KB in size; sectors are 512 bytes (hard disks) or 2048
> bytes (CD), and therefore, volume sectors can never be larger than
> memory pages.

Good question. Again, I have no firsthand info about systems with >4K
sectors. Obviously you have 2K sectors on CDs, but that doesn't really
apply to us because we don't run with our files on CD at all...

It *could* be someone who mixed up the difference between sector size
and NTFS block size (which is definitly supoprted up to 64K/block at
least).

A quick google shows some inconclusive results :-)BUt look at for
example:
http://groups.google.se/group/microsoft.public.sqlserver.server/tree/browse_frm/thread/d3288d3b43338b47/ff5e825dd02faff4?rnum=1&hl=en&q=ntfs+sector+size&_done=%2Fgroup%2Fmicrosoft.public.sqlserver.server%2Fbrowse_frm%2Fthread%2Fd3288d3b43338b47%2Fff5e825dd02faff4%3Ftvc%3D1%26q%3Dntfs+sector+size%26hl%3Den%26#doc_4556b64132b3baa7

This seems to indicate that *Windows* supports sector sizes >4K, but SQL
Server doesn't. But again, it could be a mixup between cluster and
sector size...

//MAgnus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-01-16 09:14:26 Re: [HACKERS] Checkpoint request failed on version 8.2.1.
Previous Message Mark Cave-Ayland 2007-01-16 08:23:12 Re: Function execution costs 'n all that

Browse pgsql-patches by date

  From Date Subject
Next Message Dave Page 2007-01-16 09:31:07 Re: pg_dumpall default database
Previous Message Albe Laurenz 2007-01-16 07:37:00 Re: pg_dumpall default database