Re: [GENERAL] Slow PITR restore

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jeff Trout <threshar(at)threshar(dot)is-a-geek(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Slow PITR restore
Date: 2007-12-14 12:32:07
Message-ID: 1197635527.7974.14.camel@hannu-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Ühel kenal päeval, N, 2007-12-13 kell 20:25, kirjutas Heikki
Linnakangas:
...
> Hmm. That assumes that nothing else than the WAL replay will read
> pages into shared buffers. I guess that's true at the moment, but it
> doesn't seem impossible that something like Florian's read-only queries
> on a stand by server would change that.
>
> > I think that is better than both methods mentioned, and definitely
> > simpler than my brute-force method. It also lends itself to using both
> > previously mentioned methods as additional techniques if we really
> > needed to. I suspect reordering the I/Os in this way is going to make a
> > huge difference to cache hit rates.
>
> But it won't actually do anything to scale the I/O. You're still going
> to be issuing only one read request at a time. The order of those
> requests will be better from cache hit point of view, which is good, but
> the problem remains that if the modified data blocks are scattered
> around the database, you'll be doing random I/O, one request at a time.

Why one-at-a-time ?

You could have a long list of pages need to read in, and ask for them
all at the same time.

Here's what I mean

1 ) allocate buffers for N database pages, and a queue for N wal records
2 ) read N wal records to wal record queue, assign database page numbers
from these to buffer pages and issue posix_fadvise() for all as you go.
2a ) if there were repeated pages and thus there are free buffers,
allocate queu items and read some more wal records and assign buffer and
fadvise until N fubbers used
3) process wal record queue to buffers read in by 2
4) write the buffers back to disk

repeat from 2), freeing LRU buffers

Here reads in 2) will be optimised by system via posix_fadvise, and also
the caches can be split between multiple workers by page number hash or
some other random/uniform means to use more than one CPU

-------------
Hannu

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ashish Karalkar 2007-12-14 12:41:29 Re: Planner ignoring to use INDEX SCAN
Previous Message Vincenzo Romano 2007-12-14 12:29:04 Anomalia file FILBD.TXT

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2007-12-14 12:48:50 Re: VLDB Features
Previous Message Hannu Krosing 2007-12-14 12:15:17 Re: [GENERAL] Slow PITR restore