Re: WIP: WAL prefetch (another approach)

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2020-05-02 15:14:23
Message-ID: 20200502151423.yf52i63u232fdfrg@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sat, Apr 25, 2020 at 09:19:35PM +0200, Dmitry Dolgov wrote:
> > On Tue, Apr 21, 2020 at 05:26:52PM +1200, Thomas Munro wrote:
> >
> > One report I heard recently said that if you get rid of I/O stalls,
> > pread() becomes cheap enough that the much higher frequency lseek()
> > calls I've complained about elsewhere[1] become the main thing
> > recovery is doing, at least on some systems, but I haven't pieced
> > together the conditions required yet. I'd be interested to know if
> > you see that.
>
> At the moment I've performed couple of tests for the replication in case
> when almost everything is in memory (mostly by mistake, I was expecting
> that a postgres replica within a badly memory limited cgroup will cause
> more IO, but looks like kernel do not evict pages anyway). Not sure if
> that's what you mean by getting rid of IO stalls, but in these tests
> profiling shows lseek & pread appear in similar amount of samples.
>
> If I understand correctly, eventually one can measure prefetching
> influence by looking at different redo function execution time (assuming
> that data they operate with is already prefetched they should be
> faster). I still have to clarify what is the exact reason, but even in
> the situation described above (in memory) there is some visible
> difference, e.g.

I've finally performed couple of tests involving more IO. The
not-that-big dataset of 1.5 GB for the replica with the memory allowing
fitting ~ 1/6 of it, default prefetching parameters and an update
workload with uniform distribution. Rather a small setup, but causes
stable reading into the page cache on the replica and allows to see a
visible influence of the patch (more measurement samples tend to happen
at lower latencies):

# with patch
Function = b'heap_redo' [206]
nsecs : count distribution
1024 -> 2047 : 0 | |
2048 -> 4095 : 32833 |********************** |
4096 -> 8191 : 59476 |****************************************|
8192 -> 16383 : 18617 |************ |
16384 -> 32767 : 3992 |** |
32768 -> 65535 : 425 | |
65536 -> 131071 : 5 | |
131072 -> 262143 : 326 | |
262144 -> 524287 : 6 | |

# without patch
Function = b'heap_redo' [130]
nsecs : count distribution
1024 -> 2047 : 0 | |
2048 -> 4095 : 20062 |*********** |
4096 -> 8191 : 70662 |****************************************|
8192 -> 16383 : 12895 |******* |
16384 -> 32767 : 9123 |***** |
32768 -> 65535 : 560 | |
65536 -> 131071 : 1 | |
131072 -> 262143 : 460 | |
262144 -> 524287 : 3 | |

Not that there were any doubts, but at the same time it was surprising
to me how good linux readahead works in this situation. The results
above are shown with disabled readahead for filesystem and device, and
without that there was almost no difference, since a lot of IO was
avoided by readahead (which was in fact the majority of all reads):

# with patch
flags = Read
usecs : count distribution
16 -> 31 : 0 | |
32 -> 63 : 1 |******** |
64 -> 127 : 5 |****************************************|

flags = ReadAhead-Read
usecs : count distribution
32 -> 63 : 0 | |
64 -> 127 : 131 |****************************************|
128 -> 255 : 12 |*** |
256 -> 511 : 6 |* |

# without patch
flags = Read
usecs : count distribution
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 4 |****************************************|

flags = ReadAhead-Read
usecs : count distribution
32 -> 63 : 0 | |
64 -> 127 : 143 |****************************************|
128 -> 255 : 20 |***** |

Numbers of reads in this case were similar with and without patch, which
means it couldn't be attributed to the situation when a page was read
too early, then evicted and read again later.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-05-02 16:59:05 Re: SLRU statistics
Previous Message Tomas Vondra 2020-05-02 14:05:29 Re: pg_stat_reset_slru(name) doesn't seem to work as documented