Re: Page-at-a-time Locking Considerations

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Page-at-a-time Locking Considerations
Date: 2008-03-23 11:04:32
Message-ID: 1206270272.4285.765.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2008-03-22 at 20:37 -0400, Bruce Momjian wrote:
> With no concrete patch or performance numbers, this thread has been
> removed from the patches queue.

I agree since there is no patch.

However, I think recent performance reports around the cost of
visibility checks such as "Very slow seq scan" by Craig Ringer on
Perform list on 10 Mar shows that this remains an area of concern. We
may have tuned some parts of the visibility checks, but not all.

So I think it should be a TODO to investigate further.

> Simon Riggs wrote:
> >
> > In heapgetpage() we hold the buffer locked while we look for visible
> > tuples. That works well in most cases since the visibility check is fast
> > if we have status bits set. If we don't have visibility bits set we have
> > to do things like scan the snapshot and confirm things via clog lookups.
> > All of that takes time and can lead to long buffer lock times, possibly
> > across multiple I/Os in the very worst cases.
> >
> > This doesn't just happen for old transactions. Accessing very recent
> > TransactionIds is prone to rare but long waits when we ExtendClog().
> >
> > Such problems are numerically rare, but the buffers with long lock times
> > are also the ones that have concurrent or at least recent write
> > operations on them. So all SeqScans have the potential to induce long
> > wait times for write transactions, even if they are scans on 1 block
> > tables. Tables with heavy write activity on them from multiple backends
> > have their work spread across multiple blocks, so a SeqScan will hit
> > this issue repeatedly as it encounters each current insertion point in a
> > table and so greatly increases the chances of it occurring.
> >
> > It seems possible to just memcpy() the whole block away and then drop
> > the lock quickly. That gives a consistent lock time in all cases and
> > allows us to do the visibility checks in our own time. It might seem
> > that we would end up copying irrelevant data, which is true. But the
> > greatest cost is memory access time. If hardware memory pre-fetch cuts
> > in we will find that the memory is retrieved en masse anyway; if it
> > doesn't we will have to wait for each cache line. So the best case is
> > actually an en masse retrieval of cache lines, in the common case where
> > blocks are fairly full (vague cutoff is determined by exact mechanism of
> > hardware/compiler induced memory prefetch).
> >
> > The copied block would be used only for visibility checks. The main
> > buffer would retain its pin and we would pass references to the block
> > through the executor as normal. So this would be a change completely
> > isolated to heapgetpage().
> >
> > Was the copy-aside method considered when we introduced page at a time
> > mode? Any reasons to think it would be dangerous or infeasible? If not,
> > I'll give it a bash and get some test results.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kurt Roeckx 2008-03-23 14:05:26 Re: gcc 4.3 breaks ContribCheck in 8.2 and older.
Previous Message Guillaume Smet 2008-03-23 07:56:10 Re: Logging conflicted queries on deadlocks