Re: Bug: Buffer cache is not scan resistant

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Rady <drady(at)greenplum(dot)com>, Sherry Moore <sherry(dot)moore(at)sun(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-07 02:28:30
Message-ID: 1173234511.13722.560.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2007-03-06 at 18:29 +0000, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote:
> >> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote:
> >>> Another approach I proposed back in December is to not have a
> >>> variable like that at all, but scan the buffer cache for pages
> >>> belonging to the table you're scanning to initialize the scan.
> >>> Scanning all the BufferDescs is a fairly CPU and lock heavy
> >>> operation, but it might be ok given that we're talking about large
> >>> I/O bound sequential scans. It would require no DBA tuning and
> >>> would work more robustly in varying conditions. I'm not sure where
> >>> you would continue after scanning the in-cache pages. At the
> >>> highest in-cache block number, perhaps.
> >> If there was some way to do that, it'd be what I'd vote for.
> >>
> >
> > I still don't know how to make this take advantage of the OS buffer
> > cache.
>
> Yep, I don't see any way to do that. I think we could live with that,
> though. If we went with the sync_scan_offset approach, you'd have to
> leave a lot of safety margin in that as well.
>

Right, there would certainly have to be a safety margin with
sync_scan_offset. However, your plan only works when the shared buffers
are dominated by this sequential scan. Let's say you have 40% of
physical memory for shared buffers, and say that 50% are being used for
hot pages in other parts of the database. That means you have access to
only 20% of physical memory to optimize for this sequential scan, and
20% of the physical memory is basically unavailable (being used for
other parts of the database).

In my current implementation, you could set sync_scan_offset to 1.0
(meaning 1.0 x shared_buffers), giving you 40% of physical memory that
would be used for starting this sequential scan. In this case, that
should be a good margin of error, considering that as much as 80% of the
physical memory might actually be in cache (OS or PG cache).

This all needs to be backed up by testing, of course. I'm just
extrapolating some numbers that look vaguely reasonable to me.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sherry Moore 2007-03-07 03:05:09 Re: Bug: Buffer cache is not scan resistant
Previous Message ITAGAKI Takahiro 2007-03-07 02:18:20 Re: Aggressive freezing in lazy-vacuum