Re: WAL Performance Improvements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Helge Bahmann <bahmann(at)math(dot)tu-freiberg(dot)de>
Cc: Janardhana Reddy <jana-reddy(at)mediaring(dot)com(dot)sg>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: WAL Performance Improvements
Date: 2002-02-26 16:48:53
Message-ID: 428.1014742133@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Helge Bahmann <bahmann(at)math(dot)tu-freiberg(dot)de> writes:
> This is not to say that your WAL optimization is worthless, but the
> benchmark you gave is certainly wrong.

There is actually good reason to think that the change would be a net
loss in many scenarios. The problem is that if you write a partial
filesystem block, the kernel must first read in the old contents of
the block, then overlay the data you've specified to write onto the
appropriate part of the buffer. That disk read takes time --- and
what's worse, it's physical I/O that will be done while the process
requesting the write is holding the WALWriteLock. (AFAIK the kernel
will not absorb the user data until it's got a buffer to dump it
into; anyone want to dig into kernel sources and confirm that?)

On the other hand, when you write a full block, there's no need to
read the old block contents. The user data will just be copied to
a freshly-allocated kernel disk buffer. This is why I suggested
that the first write of a WAL block should write the entire block.
We can hope that subsequent writes of just part of the block will find
the block still in kernel disk buffers, and so avoid a read operation.

AFAICS the only real win that can be gotten with a change like
Janardhana's would be to avoid writing multiple blocks in the case
where the filesystem block size is smaller than the xlog's BLCKSZ.
Tuning this correctly would require knowing the kernel's block size.
Anyone have ideas about a portable way to find that out?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-02-26 16:51:54 Re: quotes in SET grammar
Previous Message Thomas Lockhart 2002-02-26 16:44:42 Re: quotes in SET grammar

Browse pgsql-patches by date

  From Date Subject
Next Message Helge Bahmann 2002-02-26 17:31:49 Re: WAL Performance Improvements
Previous Message Tom Lane 2002-02-26 16:00:33 Re: minor doc patch for example in 'SET' docs