Re: Shared locking in slru.c

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Shared locking in slru.c
Date: 2005-12-05 20:32:38
Message-ID: 2237.1133814758@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> The way the attached patch attacks this is for the shared-lock access
> case to simply set the page's LRU counter to zero, without bumping up
> the LRU counters of the other pages as the normal adjustment would do.
> ...
> I'm not totally happy with this heuristic, though, and was
> wondering if anyone had a better idea. Anyone seen a lock-free
> data structure for LRU or approximately-LRU state tracking?

I've come up with what seems a slightly better way to do this. The idea
is to replace the current structure for tracking which page is the least-
recently-used with this:

typedef struct SlruSharedData
{
...

/*----------
* We mark a page "most recently used" by setting
* page_lru_count[slotno] = ++cur_lru_count;
* The oldest page is therefore the one with the highest value of
* cur_lru_count - page_lru_count[slotno]
*----------
*/
int cur_lru_count;

...

int page_lru_count[NUM_SLRU_BUFFERS];

...

which makes the SlruRecentlyUsed macro look like

#define SlruRecentlyUsed(shared, slotno) \
do { \
int new_lru_count = (shared)->cur_lru_count; \
if (new_lru_count != (shared)->page_lru_count[slotno]) { \
(shared)->cur_lru_count = ++new_lru_count; \
(shared)->page_lru_count[slotno] = new_lru_count; \
} \
} while (0)

and SlruSelectLRUPage() has to look for the maximum value of
"cur_count - shared->page_lru_count[slotno]" rather than just
"shared->page_lru_count[slotno]" as before. This seems like a win
in any case since it takes cycles out of the commonly used path at
a small increase in the cost of SlruSelectLRUPage --- but in that
routine you are about to do I/O anyway, so a few extra subtractions
are negligible.

However, the real reason for doing this is that I think it's OK for
the proposed SimpleLruReadPage_ReadOnly routine to apply this form
of SlruRecentlyUsed even though it holds only a shared lock. Assuming
that int reads and writes are atomic, the only possible failures are

1. If a process running the macro is delayed, it might store a stale
(hence too small) value back into cur_lru_count or a page_lru_count
array element after someone else has incremented them to a larger value.

2. Two processes might read the same cur_lru_count value at the same
time, so that one of their increments is lost. This has the same end
effect as #1, though.

Given reasonable care in SlruSelectLRUPage (see attached excerpt), we
can defend against these scenarios and usually still make a reasonable
choice of which page to evict. In any case, the worst possible scenario
is that we make a nonoptimal choice of page to evict due to confused
lru_count state. This price seems well worth the chance to reduce
contention for shared memory.

Thoughts, objections?

regards, tom lane

/*
* If we find any EMPTY slot, just select that one. Else locate the
* least-recently-used slot to replace.
*
* Normally the page_lru_count values will all be different and so
* there will be a well-defined LRU page. But since we allow
* concurrent execution of SlruRecentlyUsed() within
* SimpleLruReadPage_ReadOnly(), it is possible that multiple pages
* acquire the same lru_count values. In that case we break ties by
* choosing the furthest-back page.
*
* In no case will we select the slot containing latest_page_number
* for replacement, even if it appears least recently used.
*
* Notice that this next line forcibly advances cur_lru_count to a
* value that is certainly beyond any value that will be in the
* page_lru_count array after the loop finishes. This ensures that
* the next execution of SlruRecentlyUsed will give us good data,
* even if it's against a page that has the current counter value.
*/
cur_count = (shared->cur_lru_count)++;
best_delta = -1;
bestslot = 0; /* no-op, just keeps compiler quiet */
best_page_number = 0; /* ditto */
for (slotno = 0; slotno < NUM_SLRU_BUFFERS; slotno++)
{
int this_delta;
int this_page_number;

if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
return slotno;
this_delta = cur_count - shared->page_lru_count[slotno];
if (this_delta < 0)
{
/*
* Clean up in case shared updates have caused cur_count
* increments to get "lost". We back off the page counts,
* rather than trying to increase cur_count, to avoid any
* question of infinite loops or failure in the presence of
* wrapped-around counts.
*/
shared->page_lru_count[slotno] = cur_count;
this_delta = 0;
}
this_page_number = shared->page_number[slotno];
if ((this_delta > best_delta ||
(this_delta == best_delta &&
ctl->PagePrecedes(this_page_number, best_page_number))) &&
this_page_number != shared->latest_page_number)
{
bestslot = slotno;
best_delta = this_delta;
best_page_number = this_page_number;
}
}

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-12-05 20:37:19 Re: [PATCHES] snprintf() argument reordering not working
Previous Message Zoltan Boszormenyi 2005-12-05 19:59:50 Re: SERIAL type feature request