Re: Proposal of PITR performance improvement for 8.4.

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal of PITR performance improvement for 8.4.
Date: 2008-10-29 08:32:34
Message-ID: 1225269154.3971.278.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote:

> 1. You should avoid useless posix_fadvise() calls. In the naive
> implementation, where you simply call posix_fadvise() for every page
> referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls
> per WAL record, and that's a lot of overhead. We face the same design
> question as with Greg's patch to use posix_fadvise() to prefetch index
> and bitmap scans: what should the interface to the buffer manager look
> like? The simplest approach would be a new function call like
> AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the
> page if it's not in the buffer cache, but is a no-op otherwise. But that
> means more overhead, since for every page access, we need to find the
> page twice in the buffer cache; once for the AdviseBuffer() call, and
> 2nd time for the actual ReadBuffer().

That's a much smaller overhead than waiting for an I/O. The CPU overhead
isn't really a problem if we're I/O bound.

> It would be more efficient to pin
> the buffer in the AdviseBuffer() call already, but that requires much
> more changes to the callers.

That would be hard to cleanup safely, plus we'd have difficulty with
timing: is there enough buffer space to allow all the prefetched blocks
live in cache at once? If not, this approach would cause problems.

> 2. The format of each WAL record is different, so you need a "readahead
> handler" for every resource manager, for every record type. It would be
> a lot simpler if there was a standardized way to store that information
> in the WAL records.

I would prefer a new rmgr API call that returns a list of blocks. That's
better than trying to make everything fit one pattern. If the call
doesn't exist then that rmgr won't get prefetch.

> 3. IIRC I tried to handle just a few most important WAL records at
> first, but it turned out that you really need to handle all WAL records
> (that are used at all) before you see any benefit. Otherwise, every time
> you hit a WAL record that you haven't done posix_fadvise() on, the
> recovery "stalls", and you don't need much of those to diminish the gains.
>
> Not sure how these apply to your approach, it's very different. You seem
> to handle 1. by collecting all the page references for the WAL file, and
> sorting and removing the duplicates. I wonder how much CPU time is spent
> on that?

Removing duplicates seems like it will save CPU.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2008-10-29 08:42:43 Updates of SE-PostgreSQL 8.4devel patches (r1155)
Previous Message Svenne Krap 2008-10-29 08:20:24 Re: Feature Request - Table Definition query