Re: V4 of PITR performance improvement for 8.4

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: V4 of PITR performance improvement for 8.4
Date: 2009-03-09 13:20:17
Message-ID: 49B51791.5080303@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Mar 3, 2009 at 1:47 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Hi Suzuki-san,
>>
>> On Thu, Feb 26, 2009 at 5:03 AM, Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com> wrote:
>>> My reply to Gregory's comment didn't have any objections. I believe,
>>> as I posted to Wiki page, latest posted patch is okay and waiting for
>>> review.
>> One of your latest patches doesn't match with HEAD, so I updated it.
>
> Oops! I failed in attaching the patch. This is second try.

Thanks. This patch seems to be missing the new readahead.c file. I
grabbed that from the previous patch version.

It's annoying that we have to write *_readahead functions for each and
every record type. In most record types, we already pass the information
about the pages involved to XLogInsert, for full page writes. If we
could change the WAL format so that that information is stored in WAL
even when a full page image is taken, we could do the read-ahead for
every WAL record type in a single function. Getting the API right needs
some thinking, but that would be a lot nicer approach in the long run.

I agree with Itagaki-san's earlier comment that we should find a way to
avoid the ReadAheadHasRoom calls
(http://archives.postgresql.org/message-id/20090114101659.89CD.52131E4D@oss.ntt.co.jp).
Let's just leave enough slack in the queue, so that it never fills up.
And if the unthinkable happens and it does fill up, we can just throw
away the readahead requests that don't fit; they're just hints anyway.

The API for ReadAheadAddEntry should include ForkNumber. Although all
the readahead calls included in the patch all access the main fork,
that's really just an omission that probably should be fixed even though
the FSM and visibility map should become cached pretty quickly for any
active tables.

effective_io_concurrency setting is ignored. If it's set to 1, we should
disable prefetching altogether for the sake of both robustness (let you
recover even if there's a bug in the readahead code) and performance
(avoid useless posix_fadvise calls, sorting etc. if there's only one
spindle).

The bursty queuing behavior feels pretty strange to me, though I guess
it works pretty well in practice. I would've assumed a FIFO queue of WAL
records, with some kind of a cache of recently issued posix_fadvise() calls.

As the patch stands, it's not limited to archive recovery. The code in
readahead.c seems to assume that the readahead queue will always be
flushed between xlog segment switch, but that is not enforced in xlog.c.

malloc() can return NULL on out of memory. Need to check that, or use
palloc() instead.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-03-09 13:35:05 Re: status of remaining patches
Previous Message Bruce Momjian 2009-03-09 13:08:41 Visibility function comment addition