Re: Bug: Buffer cache is not scan resistant

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>
Cc: "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-05 18:24:45
Message-ID: 20614.1173119085@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> "Pavan Deolasee" <pavan(at)enterprisedb(dot)com> writes:
>> Isn't the size of the shared buffer pool itself acting as a performance
>> penalty in this case ? May be StrategyGetBuffer() needs to make multiple
>> passes over the buffers before the usage_count of any buffer is reduced
>> to zero and the buffer is chosen as replacement victim.

> I read that and thought you were onto something, but it's not acting
> quite the way I expect. I made a quick hack in StrategyGetBuffer() to
> count the number of buffers it looks at before finding a victim.
> ...
> Yes, autovacuum is off, and bgwriter shouldn't have anything useful to
> do either, so I'm a bit at a loss what's going on --- but in any case,
> it doesn't look like we are cycling through the entire buffer space
> for each fetch.

Nope, Pavan's nailed it: the problem is that after using a buffer, the
seqscan leaves it with usage_count = 1, which means it has to be passed
over once by the clock sweep before it can be re-used. I was misled in
the 32-buffer case because catalog accesses during startup had left the
buffer state pretty confused, so that there was no long stretch before
hitting something available. With a large number of buffers, the
behavior is that the seqscan fills all of shared memory with buffers
having usage_count 1. Once the clock sweep returns to the first of
these buffers, it will have to pass over all of them, reducing all of
their counts to 0, before it returns to the first one and finds it now
usable. Subsequent tries find a buffer immediately, of course, until we
have again filled shared_buffers with usage_count 1 everywhere. So the
problem is not so much the clock sweep overhead as that it's paid in a
very nonuniform fashion: with N buffers you pay O(N) once every N reads
and O(1) the rest of the time. This is no doubt slowing things down
enough to delay that one read, instead of leaving it nicely I/O bound
all the time. Mark, can you detect "hiccups" in the read rate using
your setup?

I seem to recall that we've previously discussed the idea of letting the
clock sweep decrement the usage_count before testing for 0, so that a
buffer could be reused on the first sweep after it was initially used,
but that we rejected it as being a bad idea. But at least with large
shared_buffers it doesn't sound like such a bad idea.

Another issue nearby to this is whether to avoid selecting buffers that
are dirty --- IIRC someone brought that up again recently. Maybe
predecrement for clean buffers, postdecrement for dirty ones would be a
cute compromise.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2007-03-05 18:39:34 Re: Synchronized Scan update
Previous Message Luke Lonergan 2007-03-05 18:21:05 Re: Bug: Buffer cache is not scan resistant