Re: Bug: Buffer cache is not scan resistant

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Rady <drady(at)greenplum(dot)com>, Sherry Moore <sherry(dot)moore(at)sun(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-06 18:29:17
Message-ID: 45EDB2FD.4070705@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:
> On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote:
>> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote:
>>> Another approach I proposed back in December is to not have a
>>> variable like that at all, but scan the buffer cache for pages
>>> belonging to the table you're scanning to initialize the scan.
>>> Scanning all the BufferDescs is a fairly CPU and lock heavy
>>> operation, but it might be ok given that we're talking about large
>>> I/O bound sequential scans. It would require no DBA tuning and
>>> would work more robustly in varying conditions. I'm not sure where
>>> you would continue after scanning the in-cache pages. At the
>>> highest in-cache block number, perhaps.
>> If there was some way to do that, it'd be what I'd vote for.
>>
>
> I still don't know how to make this take advantage of the OS buffer
> cache.

Yep, I don't see any way to do that. I think we could live with that,
though. If we went with the sync_scan_offset approach, you'd have to
leave a lot of safety margin in that as well.

> However, no DBA tuning is a huge advantage, I agree with that.
>
> If I were to implement this idea, I think Heikki's bitmap of pages
> already read is the way to go. Can you guys give me some pointers about
> how to walk through the shared buffers, reading the pages that I need,
> while being sure not to read a page that's been evicted, and also not
> potentially causing a performance regression somewhere else?

You could take a look at BufferSync, for example. It walks through the
buffer cache, syncing all dirty buffers.

FWIW, I've attached a function I wrote some time ago when I was playing
with the same idea for vacuums. A call to the new function loops through
the buffer cache and returns the next buffer that belong to a certain
relation. I'm not sure that it's correct and safe, and there's not much
comments, but should work if you want to play with it...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
readanybuffer.patch text/x-patch 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2007-03-06 18:35:19 Re: Auto creation of Partitions
Previous Message Florian G. Pflug 2007-03-06 18:27:03 Re: Auto creation of Partitions