Re: finding changed blocks using WAL scanning

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-23 18:01:14
Message-ID: 20190423180114.vslfhhbtyan5aqba@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 23, 2019 at 10:09:39AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-04-23 19:01:29 +0200, Tomas Vondra wrote:
>> On Tue, Apr 23, 2019 at 09:34:54AM -0700, Andres Freund wrote:
>> > Hi,
>> >
>> > On 2019-04-23 18:07:40 +0200, Tomas Vondra wrote:
>> > > Well, the thing is that for prefetching to be possible you actually have
>> > > to be a bit behind. Otherwise you can't really look forward which blocks
>> > > will be needed, right?
>> > >
>> > > IMHO the main use case for prefetching is when there's a spike of activity
>> > > on the primary, making the standby to fall behind, and then hours takes
>> > > hours to catch up. I don't think the cases with just a couple of MBs of
>> > > lag are the issue prefetching is meant to improve (if it does, great).
>> >
>> > I'd be surprised if a good implementation didn't. Even just some smarter
>> > IO scheduling in the startup process could help a good bit. E.g. no need
>> > to sequentially read the first and then the second block for an update
>> > record, if you can issue both at the same time - just about every
>> > storage system these days can do a number of IO requests in parallel,
>> > and it nearly halves latency effects. And reading a few records (as in a
>> > few hundred bytes commonly) ahead, allows to do much more than that.
>> >
>>
>> I don't disagree with that - prefetching certainly can improve utilization
>> of the storage system. The question is whether it can meaningfully improve
>> performance of the recovery process in cases when it does not lag. And I
>> think it can't (perhaps with remote_apply being an exception).
>
>Well. I think a few dozen records behind doesn't really count as "lag",
>and I think that's where it'd start to help (and for some record types
>like updates it'd start to help even for single records). It'd convert
>scenarios where we'd currently fall behind slowly into scenarios where
>we can keep up - but where there's no meaningful lag while we keep up.
>What's your argument for me being wrong?
>

I was not saying you are wrong. I think we actually agree on the main
points. My point is that prefetching is most valuable for cases when the
standby can't keep up and falls behind significantly - at which point we
have sufficient queue of blocks to prefetch. I don't care about the case
when the standby can keep up even without prefetching, because the metric
we need to optimize (i.e. lag) is close to 0 even without prefetching.

>And even if we'd keep up without any prefetching, issuing requests in a
>more efficient manner allows for more efficient concurrent use of the
>storage system. It'll often effectively reduce the amount of random
>iops.

Maybe, although the metric we (and users) care about the most is the
amount of lag. If the system keeps up even without prefetching, no one
will complain about I/O utilization.

When the lag is close to 0, the average throughput/IOPS/... is bound to be
the same in both cases, because it does not affect how fast the standby
receives WAL from the primary. Except that it's somewhat "spikier" with
prefetching, because we issue requests in bursts. Which may actually be a
bad thing.

Of course, maybe prefetching will make it much more efficient even in the
"no lag" case, and while it won't improve the recovery, it'll leave more
I/O bandwidth for the other processes (say, queries on hot standby).

So to be clear, I'm not against prefetching even in this case, but it's
not the primary reason why I think we need to do that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Evteev 2019-04-23 18:31:39 patch that explains Log-Shipping standby server major upgrades
Previous Message Peter Geoghegan 2019-04-23 17:40:03 Re: Pathological performance when inserting many NULLs into a unique index