Re: Scaling shared buffer eviction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-08 18:46:21
Message-ID: CA+TgmoaBPdF7Sge-P0+ngFcwHaeGcsiJBnfkOaKN-A1y1pCW0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 5, 2014 at 9:19 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Sep 5, 2014 at 5:17 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Apart from above, I think for this patch, cat version bump is required
>> as I have modified system catalog. However I have not done the
>> same in patch as otherwise it will be bit difficult to take performance
>> data.
>
> One regression failed on linux due to spacing issue which is
> fixed in attached patch.

I took another read through this patch. Here are some further review comments:

1. In BgMoveBuffersToFreelist(), it seems quite superfluous to have
both num_to_free and tmp_num_to_free. I'd get rid of tmp_num_to_free
and move the declaration of num_to_free inside the outer loop. I'd
also move the definitions of tmp_next_to_clean, tmp_recent_alloc,
tmp_recent_backend_clocksweep into the innermost scope in which they
are used.

2. Also in that function, I think the innermost bit of logic could be
rewritten more compactly, and in such a way as to make it clearer for
what set of instructions the buffer header will be locked.
LockBufHdr(bufHdr); if (bufHdr->refcount == 0) { if
(bufHdr->usage_count > 0) bufHdr->usage_count--; else add_to_freelist
= true; } UnlockBufHdr(bufHdr); if (add_to_freelist &&
StrategyMoveBufferToFreeListEnd(bufHdr)) num_to_free--;

3. This comment is now obsolete:

+ /*
+ * If bgwriterLatch is set, we need to waken the bgwriter, but we should
+ * not do so while holding freelist_lck; so set it after releasing the
+ * freelist_lck. This is annoyingly tedious, but it happens
at most once
+ * per bgwriter cycle, so the performance hit is minimal.
+ */
+

We're not actually holding any lock in need of releasing at that point
in the code, so this can be shortened to "If bgwriterLatch is set, we
need to waken the bgwriter."

* Ideally numFreeListBuffers should get called under freelist spinlock,

That doesn't make any sense. numFreeListBuffers is a variable, so you
can't "call" it. The value should be *read* under the spinlock, but
it is. I think this whole comment can be deleted and replaced with
"If the number of free buffers has fallen below the low water mark,
awaken the bgreclaimer to repopulate it."

4. StrategySyncStartAndEnd() is kind of a mess. One, it can return
the same victim buffer that's being handed out - at almost the same
time - to a backend running the clock sweep; if it does, they'll fight
over the buffer. Two, the *end out parameter actually returns a
count, not an endpoint. I think we should have
BgMoveBuffersToFreelist() call StrategySyncNextVictimBuffer() at the
top of the inner loop rather than the bottom, and change
StrategySyncStartAndEnd() so that it knows nothing about
victimbuf_lck. Let's also change StrategyGetBuffer() to call
StrategySyncNextVictimBuffer() so that the logic is centralized in one
place, and rename StrategySyncStartAndEnd() to something that better
matches its revised purpose. Maybe StrategyGetReclaimInfo().

5. Have you tested that this new bgwriter statistic is actually
working? Because it looks to me like BgMoveBuffersToFreelist is
changing BgWriterStats but never calling pgstat_send_bgwriter(), which
I'm thinking will mean the counters accumulate forever inside the
reclaimer but never get forwarded to the stats collector.

6. StrategyInitialize() uses #defines for
HIGH_WATER_MARK_FREELIST_BUFFERS_PERCENT and
LOW_WATER_MARK_FREELIST_BUFFERS_PERCENT but inline constants (5, 2000)
for clamping. Let's have constants for all of those (and omit mention
of the specific values in the comments).

7. The line you've added to the definition of view pg_stat_bgwriter
doesn't seem to be indented the same way as all the others. Tab vs.
space problem?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-09-08 20:42:14 Re: ALTER TABLESPACE MOVE command tag tweak
Previous Message Alvaro Herrera 2014-09-08 18:30:23 Re: BRIN indexes - TRAP: BadArgument