Re: 'COPY ... FROM' inserts to btree, blocks on buffer

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Wildpaner <mike(at)rainbow(dot)studorg(dot)tuwien(dot)ac(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 'COPY ... FROM' inserts to btree, blocks on buffer
Date: 2005-01-01 15:44:09
Message-ID: 1104594249.3978.1239.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2005-01-01 at 00:42, Tom Lane wrote:
> I wrote:
> > A possible fix for this is to reorder the operations in FlushBuffer
> > so that we share-lock the buffer before releasing BufMgrLock ... but
> > I'm not sure that doesn't introduce other deadlock risks.
>
> That's too simplistic, since at least one caller of FlushBuffer is
> trying to write pages that may be in active use. If the target page
> is already exclusive-locked by another process then a deadlock between
> the per-buffer lock and BufMgrLock is entirely possible.
>
> I think that it would work for BufferAlloc to share-lock the victim
> buffer before calling FlushBuffer; we'd have to add a bool parameter to
> FlushBuffer telling it the lock was already acquired. This is safe in
> this particular path because a buffer that was just taken from the
> freelist should not be exclusive-locked by anyone. (It could possibly
> be share-locked by the bgwriter, but that won't cause a deadlock.)
> A process that wanted exclusive lock would first have to pin the buffer,
> which can't happen before we release the BufMgrLock. By adding the
> parameter we'd avoid changing the behavior for other callers.
>
> Comments, better ideas?

I think the proposal sounds safe, though I worry about performance.

This situation occurs because an index block in cache was thought to be
a block replacement candidate, then found to be useful by another
backend which attempts to start using it before the flush occurs.
...which hints at the shared_buffers being set too low.

AFAICS we need not consider losing timeslices as the explanation for
concurrent activity. This situation is much more easily possible with
multi-CPU systems, especially when many single core systems have
hyperthreaded virtual CPUs and the default shared_buffers of 8Mb has not
been changed.

The effect of the proposed change would be to favour the removal of the
block from the cache rather than favoring the reuse of it. I think the
favour in this situation should go towards the reuse, forcing the
backend searching for a free buffer to search again. There are already 2
other cases in BufferAlloc which cater for when two backends want the
same block, which hints towards the importance of favouring the reuse:
this is just one more case.

My suggestion: LockBuffer in FlushBuffer should return as unsuccessful
if there is an LW_EXCLUSIVE lock already held, causing another iteration
of the do while loop in BufferAlloc. Doing that would be faster for both
backends since neither has to wait for I/O, as well as avoiding the
deadlock situation that has been uncovered.

If the backend searches for a block replacement candidate and fails more
than once, we should consider warning "shared buffers too low..."

Later, I'd like to consider changing the way StrategyGetBuffer works,
since it always stops at the first unpinned buffer and doesn't consider
whether it is dirty or not in its decision to use it. It seems like it
would be worth reading a *few* buffers further on to see if a more
appropriate choice could have been made now that we have bgwriter to
eventually clean the buffers.

--
Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-01-01 16:36:46 Re: UNICODE/UTF-8 on win32
Previous Message Simon Riggs 2005-01-01 15:43:55 Re: [HACKERS] Bgwriter behavior