Re: Bug: Buffer cache is not scan resistant

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Jim Nasby <decibel(at)decibel(dot)org>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Rady <drady(at)greenplum(dot)com>, Sherry Moore <sherry(dot)moore(at)sun(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-06 18:47:35
Message-ID: 45EDB747.90003@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
>> If I were to implement this idea, I think Heikki's bitmap of pages
>> already read is the way to go.
>
> I think that's a good way to guarantee that you'll not finish in time
> for 8.3. Heikki's idea is just at the handwaving stage at this point,
> and I'm not even convinced that it will offer any win. (Pages in
> cache will be picked up by a seqscan already.)

The scenario that I'm worried about is that you have a table that's
slightly larger than RAM. If you issue many seqscans on that table, one
at a time, every seqscan will have to read the whole table from disk,
even though say 90% of it is in cache when the scan starts.

This can be alleviated by using a large enough sync_scan_offset, but a
single setting like that is tricky to tune, especially if your workload
is not completely constant. Tune it too low, and you don't get much
benefit, tune it too high and your scans diverge and you lose all benefit.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2007-03-06 19:04:41 Re: Auto creation of Partitions
Previous Message Teodor Sigaev 2007-03-06 18:45:33 Re: GIST and TOAST