Re: Bug: Buffer cache is not scan resistant

From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>, "CK(dot)Tan" <cktan(at)greenplum(dot)com>, "John Eshleman" <jeshleman(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-09 19:36:40
Message-ID: C3E62232E3BCF24CBA20D72BFDCB6BF802AF289E@MI8NYCMAIL08.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Cool!

- Luke

Msg is shrt cuz m on ma treo

-----Original Message-----
From: Simon Riggs [mailto:simon(at)2ndquadrant(dot)com]
Sent: Friday, March 09, 2007 02:32 PM Eastern Standard Time
To: Luke Lonergan; ITAGAKI Takahiro
Cc: Sherry Moore; Tom Lane; Mark Kirkwood; Pavan Deolasee; Gavin Sherry; PGSQL Hackers; Doug Rady
Subject: Re: [HACKERS] Bug: Buffer cache is not scan resistant

On Tue, 2007-03-06 at 22:32 -0500, Luke Lonergan wrote:
> Incidentally, we tried triggering NTA (L2 cache bypass)
> unconditionally and in various patterns and did not see the
> substantial gain as with reducing the working set size.
>
> My conclusion: Fixing the OS is not sufficient to alleviate the issue.
> We see a 2x penalty (1700MB/s versus 3500MB/s) at the higher data
> rates due to this effect.
>
I've implemented buffer recycling, as previously described, patch being
posted now to -patches as "scan_recycle_buffers".

This version includes buffer recycling

- for SeqScans larger than shared buffers, with the objective of
improving L2 cache efficiency *and* reducing the effects of shared
buffer cache spoiling (both as previously discussed on this thread)

- for VACUUMs of any size, with the objective of reducing WAL thrashing
whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
originally suggested by Itagaki-san, just with a different
implementation).

Behaviour is not activated by default in this patch. To request buffer
recycling, set the USERSET GUC
SET scan_recycle_buffers = N
tested with 1,4,8,16, but only > 8 seems sensible, IMHO.

Patch effects StrategyGetBuffer, so only effects the disk->cache path.
The idea is that if its already in shared buffer cache then we get
substantial benefit already and nothing else is needed. So for the
general case, the patch adds a single if test into the I/O path.

The parameter is picked up at the start of SeqScan and VACUUM
(currently). Any change mid-scan will be ignored.

IMHO its possible to do this and to allow Synch Scans at the same time,
with some thought. There is no need for us to rely on cache spoiling
behaviour of scans to implement that feature as well.

Independent performance tests requested, so that we can discuss this
objectively.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-03-09 20:08:55 Re: scan_recycle_buffers
Previous Message Simon Riggs 2007-03-09 19:27:55 Re: Bug: Buffer cache is not scan resistant