Re: Syncrep and improving latency due to WAL throttling

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-01-28 00:36:17
Message-ID: db055152-d230-8600-d057-e45e388bb3c8@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/27/23 22:33, Andres Freund wrote:
> Hi,
>
> On 2023-01-27 21:45:16 +0100, Tomas Vondra wrote:
>> On 1/27/23 08:18, Bharath Rupireddy wrote:
>>>> I think my idea of only forcing to flush/wait an LSN some distance in the past
>>>> would automatically achieve that?
>>>
>>> I'm sorry, I couldn't get your point, can you please explain it a bit more?
>>>
>>
>> The idea is that we would not flush the exact current LSN, because
>> that's likely somewhere in the page, and we always write the whole page
>> which leads to write amplification.
>>
>> But if we backed off a bit, and wrote e.g. to the last page boundary,
>> that wouldn't have this issue (either the page was already flushed -
>> noop, or we'd have to flush it anyway).
>
> Yep.
>
>
>> We could even back off a bit more, to increase the probability it was
>> actually flushed / sent to standby.
>
> That's not the sole goal, from my end: I'd like to avoid writing out +
> flushing the WAL in too small chunks. Imagine a few concurrent vacuums or
> COPYs or such - if we're unlucky they'd each end up exceeding their "private"
> limit close to each other, leading to a number of small writes of the
> WAL. Which could end up increasing local commit latency / iops.
>
> If we instead decide to only ever flush up to something like
> last_page_boundary - 1/8 * throttle_pages * XLOG_BLCKSZ
>
> we'd make sure that the throttling mechanism won't cause a lot of small
> writes.
>

I'm not saying we shouldn't do this, but I still don't see how this
could make a measurable difference. At least assuming a sensible value
of the throttling limit (say, more than 256kB per backend), and OLTP
workload running concurrently. That means ~64 extra flushes/writes per
16MB segment (at most). Yeah, a particular page might get unlucky and be
flushed by multiple backends, but the average still holds. Meanwhile,
the OLTP transactions will generate (at least) an order of magnitude
more flushes.

>
>>> Keeping replication lag under check enables one to provide a better
>>> RPO guarantee as discussed in the other thread
>>> https://www.postgresql.org/message-id/CAHg%2BQDcO_zhgBCMn5SosvhuuCoJ1vKmLjnVuqUEOd4S73B1urw%40mail.gmail.com.
>>>
>>
>> Isn't that a bit over-complicated? RPO generally only cares about xacts
>> that committed (because that's what you want to not lose), so why not to
>> simply introduce a "sync mode" that simply uses a bit older LSN when
>> waiting for the replica? Seems much simpler and similar to what we
>> already do.
>
> I don't think that really helps you that much. If there's e.g. a huge VACUUM /
> COPY emitting loads of WAL you'll suddenly see commit latency of a
> concurrently committing transactions spike into oblivion. Whereas a general
> WAL throttling mechanism would throttle the VACUUM, without impacting the
> commit latency of normal transactions.
>

True, but it solves the RPO goal which is what the other thread was about.

IMHO it's useful to look at this as a resource scheduling problem:
limited WAL bandwidth consumed by backends, with the bandwidth
distributed using some sort of policy.

The patch discussed in this thread uses fundamentally unfair policy,
with throttling applied only on backends that produce a lot of WAL. And
trying to leave the OLTP as unaffected as possible.

The RPO thread seems to be aiming for a "fair" policy, providing the
same fraction of bandwidth to all processes. This will affect all xacts
the same way (sleeps proportional to amount of WAL generated by the xact).

Perhaps we want such alternative scheduling policies, but it'll probably
require something like the autovacuum throttling.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-01-28 00:37:49 Re: Syncrep and improving latency due to WAL throttling
Previous Message Michael Paquier 2023-01-28 00:24:06 Re: recovery modules