Re: Page-at-a-time Locking Considerations

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Page-at-a-time Locking Considerations
Date: 2008-03-23 00:37:06
Message-ID: 200803230037.m2N0b6c19764@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


With no concrete patch or performance numbers, this thread has been
removed from the patches queue.

---------------------------------------------------------------------------

Simon Riggs wrote:
>
> In heapgetpage() we hold the buffer locked while we look for visible
> tuples. That works well in most cases since the visibility check is fast
> if we have status bits set. If we don't have visibility bits set we have
> to do things like scan the snapshot and confirm things via clog lookups.
> All of that takes time and can lead to long buffer lock times, possibly
> across multiple I/Os in the very worst cases.
>
> This doesn't just happen for old transactions. Accessing very recent
> TransactionIds is prone to rare but long waits when we ExtendClog().
>
> Such problems are numerically rare, but the buffers with long lock times
> are also the ones that have concurrent or at least recent write
> operations on them. So all SeqScans have the potential to induce long
> wait times for write transactions, even if they are scans on 1 block
> tables. Tables with heavy write activity on them from multiple backends
> have their work spread across multiple blocks, so a SeqScan will hit
> this issue repeatedly as it encounters each current insertion point in a
> table and so greatly increases the chances of it occurring.
>
> It seems possible to just memcpy() the whole block away and then drop
> the lock quickly. That gives a consistent lock time in all cases and
> allows us to do the visibility checks in our own time. It might seem
> that we would end up copying irrelevant data, which is true. But the
> greatest cost is memory access time. If hardware memory pre-fetch cuts
> in we will find that the memory is retrieved en masse anyway; if it
> doesn't we will have to wait for each cache line. So the best case is
> actually an en masse retrieval of cache lines, in the common case where
> blocks are fairly full (vague cutoff is determined by exact mechanism of
> hardware/compiler induced memory prefetch).
>
> The copied block would be used only for visibility checks. The main
> buffer would retain its pin and we would pass references to the block
> through the executor as normal. So this would be a change completely
> isolated to heapgetpage().
>
> Was the copy-aside method considered when we introduced page at a time
> mode? Any reasons to think it would be dangerous or infeasible? If not,
> I'll give it a bash and get some test results.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-03-23 01:10:22 pg_dump -i wording
Previous Message Bruce Momjian 2008-03-23 00:32:20 Re: pg_dump additional options for performance