Re: Bug: Buffer cache is not scan resistant

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-06 04:23:04
Message-ID: C2122CA8.288AC%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

On 3/5/07 7:58 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I looked a bit at the Linux code that's being used here, but it's all
> x86_64 assembler which is something I've never studied :-(.

Here's the C wrapper routine in Solaris:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/
move.c

Here's the x86 assembler routine for Solaris:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32
/ml/copy.s

The actual uiomove routine is a simple wrapper that calls the assembler
kcopy or xcopyout routines. There are two versions (for Opteron), one that
uses the NTA instructions that bypass the L2 cache on writing to avoid L2
cache pollution, and the second writes normally - through the L2 cache.
Which one is used depends on a parameter (global) based on the size of the
I/O. It is tuned to identify operations that might pollute the L2 cache
(sound familiar?)

I think what we're seeing is a generic artifact of the write-through
behavior of the cache. I wouldn't expect this to get any better with
DIRECTIO to the shared_buffers in pgsql - if we iterate over a large number
of user space buffers we'll still hit the increased L2 thrashing.

I think we're best off with a hybrid approach - when we "detect" a seq scan
larger (much larger?) than buffer cache, we can switch into the "cache
bypass" behavior, much like the above code uses the NTA instruction when
appropriate.

We can handle syncscan using a small buffer space.

- Luke

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2007-03-06 04:56:25 Re: PL/Python warnings in CVS HEAD
Previous Message Tom Lane 2007-03-06 04:15:39 Re: proposal: custom variables management