Re: Proposal of PITR performance improvement for 8.4.

From: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal of PITR performance improvement for 8.4.
Date: 2008-10-29 00:55:55
Message-ID: a778a7260810281755x10000895lf7d40688d55349bb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for a lot of inspiring discussions.

Please note that my proposal includes only a few lines of change to
the recovery code itself. It does not affect buffer management,
order of WAL record applying etc. Only change needed is to invoke
prefetch feature if redo is going to read WAL which has not been
handled by the prefetch (prefetch function returns last-handled LSN).

Before writing the readahead code, I ran several experiment how
posix_fadvise() speeds up random read and I found that
POSIX_FADV_WILLNEED can improve total read performance for around five
times, if we schedule the order of posix_fadvise() call to the order
of block position. Without random position, the improvement ratio
was around three times. This result was achieved with single
process, but for RAID configuration. I'd like to do the similar
measurement against single disk.

I'd like to run some benchmark to clarify the improvement. I agree
I should show how my proposal is useful.

In terms of the influence to the recovery code, pg_readahead just
calls posix_fadvise() to tell the operating system to prefetch the
data page to kernel's cash, not PG's shared memory, so we don't have
to implement this in PG core code. Because of this and I think it
is more practical to have platform-specific code to outside as
possible, I wrote most of the prefetch in the external process, which
can be available at contrib or PgFoundry, perhaps the latter.
Heikki suggested to have separate reader process. I think it's very
good idea but with this idea, but this will change PG's performance
dramatically. Better in some case, but even worse in other cases
possibly. I don't have clear on this. So I think background
reader issue should be a challange to 8.5 or further and we must call
for research works. So far, I think it is reasonable to keep
improving specific code.

I'd like to hear some more about these. I'm more than happy to write
all the code inside PG core to avoid overhead to create another
process.

---
Koichi Suzuki

2008/10/29 Gregory Stark <stark(at)enterprisedb(dot)com>:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>
>> On Tue, 2008-10-28 at 17:40 -0400, Bruce Momjian wrote:
>>> Gregory Stark wrote:
>>> > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>> >
>>> > > I'm happy with the idea of a readahead process. I thought we were
>>> > > implementing a BackgroundReader process for other uses. Is that dead
>>> > > now?
>>> >
>>> > You and Bruce seem to keep resurrecting that idea. I've never liked it -- I
>>> > always hated that in Oracle and thought it was a terrible kludge.
>>>
>>> I didn't think I was promoting the separate reader process after you had
>>> the posix_fadvise() idea.
>
> I'm sorry, I thought I remembered you mentioning it again. But perhaps I was
> thinking of someone else (perhaps it was Simon again?) or perhaps it was
> before you saw the actual patch.
>
>> It would be good if the solutions for normal running and recovery were
>> similar. Greg, please could you look into that?
>
> I could do the readahead side of things but what I'm not sure how to arrange
> is how to restructure the wal reading logic to read records ahead of the
> actual replay.
>
> I think we would have to maintain two pointers one for the prefetch and one
> for the actual running. But the logic in for recovery is complex enough that
> I'm concerned about changing it enough to do that and whether it can be done
> without uglifying the code quite a bit.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's RemoteDBA services!
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Koichi Suzuki 2008-10-29 00:57:55 Re: Proposal of PITR performance improvement for 8.4.
Previous Message Hiroshi Saito 2008-10-29 00:54:05 Re: UUID-OSSP Contrib Module Compilation Issue