Re: Analysis of ganged WAL writes

From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Curtis Faith <curtis(at)galtair(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Hannu Krosing <hannu(at)tm(dot)ee>, Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Analysis of ganged WAL writes
Date: 2002-10-08 12:34:52
Message-ID: 1034080494.26053.273.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2002-10-08 at 04:15, Zeugswetter Andreas SB SD wrote:
> Can the magic be, that kaio directly writes from user space memory to the
> disk ? Since in your case all transactions A-E want the same buffer written,
> the memory (not it's content) will also be the same. This would automatically
> write the latest possible version of our WAL buffer to disk.
>

*Some* implementations allow for zero-copy aio. That is a savings. On
heavily used systems, it can be a large savings.

> The problem I can see offhand is how the kaio system can tell which transaction
> can be safely notified of the write, or whether the programmer is actually responsible
> for not changing the buffer until notified of completion ?

That's correct. The programmer can not change the buffer contents until
notification has completed for that outstanding aio operation. To do
otherwise results in undefined behavior. Since some systems do allow
for zero-copy aio operations, requiring the buffers not be modified,
once queued, make a lot of sense. Of course, even on systems that don't
support zero-copy, changing the buffered data prior to write completion
just seems like a bad idea to me.

Here's a quote from SGI's aio_write man page:
If the buffer pointed to by aiocbp->aio_buf or the control block pointed
to by aiocbp changes or becomes an illegal address prior to asynchronous
I/O completion then the behavior is undefined. Simultaneous synchronous
operations using the same aiocbp produce undefined results.

And on SunOS we have:
The aiocbp argument points to an aiocb structure. If the
buffer pointed to by aiocbp->aio_buf or the control block
pointed to by aiocbp becomes an illegal address prior to
asynchronous I/O completion, then the behavior is undefined.
and
For any system action that changes the process memory space
while an asynchronous I/O is outstanding to the address
range being changed, the result of that action is undefined.

Greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shridhar Daithankar 2002-10-08 12:58:01 Re: Hot Backup
Previous Message Erwan DUROSELLE 2002-10-08 12:17:47 Re: Hot Backup