Re: WAL insert delay settings

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL insert delay settings
Date: 2019-02-19 19:22:50
Message-ID: 20190219192250.xrdffheoadu2pqjo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019-02-19 20:02:32 +0100, Tomas Vondra wrote:
> Let's do a short example. Assume the default vacuum costing parameters
>
> vacuum_cost_limit = 200
> vacuum_cost_delay = 20ms
> cost_page_dirty = 20
>
> and for simplicity we only do writes. So vacuum can do ~8MB/s of writes.
>
> Now, let's also throttle based on WAL - once in a while, after producing
> some amount of WAL we sleep for a while. Again, for simplicity let's
> assume the sleeps perfectly interleave and are also 20ms. So we have
> something like:

> sleep(20ms); -- vacuum
> sleep(20ms); -- WAL
> sleep(20ms); -- vacuum
> sleep(20ms); -- WAL
> sleep(20ms); -- vacuum
> sleep(20ms); -- WAL
> sleep(20ms); -- vacuum
> sleep(20ms); -- WAL
>
> Suddenly, we only reach 4MB/s of writes from vacuum. But we also reach
> only 1/2 the WAL throughput, because it's affected exactly the same way
> by the sleeps from vacuum throttling.
>
> We've not reached either of the limits. How exactly is this "lower limit
> takes effect"?

Because I upthread said that that's not how I think a sane
implementation of WAL throttling would work. I think the whole cost
budgeting approach is BAD, and it'd be serious mistake to copy it for a
WAL rate limit (it disregards the time taken to execute IO and CPU costs
etc, and in this case the cost of other bandwidth limitations). What
I'm saying is that we ought to instead specify an WAL rate in bytes/sec
and *only* sleep once we've exceeded it for a time period (with some
optimizations, so we don't gettimeofday after every XLogInsert(), but
instead compute how many bytes later need to re-determine the time to
see if we're still in the same 'granule').

Now, a non-toy implementation would probably would want to have a
sliding window to avoid being overly bursty, and reduce the number of
gettimeofday as mentioned above, but for explanation's sake basically
imagine that at the "main loop" of an bulk xlog emitting command would
invoke a helper with a a computation in pseudocode like:

current_time = gettimeofday();
if (same_second(current_time, last_time))
{
wal_written_in_second += new_wal_written;
if (wal_written_in_second >= wal_write_limit_per_second)
{
double too_much = (wal_written_in_second - wal_write_limit_per_second);
sleep_fractional_seconds(too_much / wal_written_in_second);

last_time = current_time;
}
}
else
{
last_time = current_time;
}

which'd mean that in contrast to your example we'd not continually sleep
for WAL, we'd only do so if we actually exceeded (or are projected to
exceed in a smarter implementation), the specified WAL write rate. As
the 20ms sleeps from vacuum effectively reduce the WAL write rate, we'd
correspondingly sleep less.

And my main point is that even if you implement a proper bytes/sec limit
ONLY for WAL, the behaviour of VACUUM rate limiting doesn't get
meaningfully more confusing than right now.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-02-19 19:34:25 Re: WAL insert delay settings
Previous Message Tomas Vondra 2019-02-19 19:20:04 Re: WAL insert delay settings