Skip site navigation (1) Skip section navigation (2)

Vacuum rate limit in KBps

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Vacuum rate limit in KBps
Date: 2012-01-15 08:24:15
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
So far the reaction I've gotten from my recent submission to make 
autovacuum log its read/write in MB/s has been rather positive.  I've 
been surprised at the unprecedented (to me at least) amount of 
backporting onto big production systems it's gotten.  There is a whole 
lot of pent up frustration among larger installs over not having good 
visibility into how changing cost-based vacuum parameters turns into 
real-world units.

That got me thinking:  if MB/s is what everyone wants to monitor, can we 
provide a UI to set these parameters that way too?  The attached patch 
is a bit rough still, but it does that.  The key was recognizing that 
the cost delay plus cost limit can be converted into an upper limit on 
cost units per second, presuming the writes themselves are free.  If you 
then also assume the worst case--that everything will end up dirty--by 
throwing in the block size, too,  you compute a maximum rate in MB/s.  
That represents the fastest you can possibly write.

If you then turn that equation around, making the maximum write rate the 
input, for any given cost delay and dirty page cost you can solve for 
the cost limit--the parameter in fictitious units everyone hates.  It 
works like this, with the computation internals logged every time they 
run for now:

#vacuum_cost_rate_limit = 4000      # maximum write rate in kilobytes/second
LOG:  cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20

That's the same cost limit that was there before, except now it's 
derived from that maximum write rate figure.  vacuum_cost_limit is gone 
as a GUC, replaced with this new vacuum_cost_rate_limit.  Internally, 
vacuum_cost_rate_limit hasn't gone anywhere though.  All of the entry 
points into vacuum and autovacuum derive an internal-only 
VacuumCostLimit as part of any setup or rebalance operation.  But 
there's no change to underlying cost management code; the cost limit is 
budgeted and accounted for in exactly the same way as it always was.

Why is this set in kilobytes/second rather than using something based on 
a memory unit?  That decision was made after noting these values can 
also be set in relation options.  Making relation options aware of 
memory unit math seemed ambitious relative to its usefulness, and it's 
not like KB/s is hard to work with in this context.

OK, I lied; technically this is set in kibibytes per second right now.  
Ran out of energy before I got to confirming that was consistent with 
all similar GUC settings, will put on my pedantic hat later to check that.

One nice thing that falls out of this is that the *vacuum_cost_delay 
settings essentially turn into a boolean.  If the delay is 0, cost 
limits are off; set it to any other value, and the rate you get is 
driven almost entirely by vacuum_cost_rate_limit (disclaimer mainly 
because of issues like sleep time accuracy are possible).  You can see 
that at work in these examples:

LOG:  cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20
LOG:  cost limit=100 based on rate limit=4000 KB/s delay=10 dirty cost=20

LOG:  cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20
LOG:  cost limit=100 based on rate limit=2000 KB/s delay=20 dirty cost=20

Halve the delay to 10, and the cost limit drops in half too to keep the 
same I/O rate.  Halve the rate limit instead, and the cost limit halves 
with it.  Most sites will never need to change the delay figure from 
20ms, they can just focus on tuning the more human-readable rate limit 
figure instead.  The main reason I thought of to keep the delay around 
as an integer still is sites trying to minimize power use, they might 
increase it from the normally used 20ms.  I'm not as worried about 
postgresql.conf settings bloat to support a valid edge use case, so long 
as most sites find a setting unnecessary to tune.  And the autovacuum 
side of cost delay should fall into that category with this change.

Here's a full autovacuum log example.  This shows how close to the KBps 
rate the server actually got, along with the autovacuum cost balancing 
working the same old way (this is after running the boring 
autovac-big.sql test case attached here too):

2012-01-15 02:10:51.905 EST: LOG:  cost limit=200 based on rate 
limit=4000 KB/s delay=20 dirty cost=20
2012-01-15 02:10:51.906 EST: DEBUG:  autovac_balance_cost(pid=13054 
db=16384, rel=16444, cost_rate_limit=4000, cost_limit=200, 
cost_limit_base=200, cost_delay=20)

2012-01-15 02:11:05.127 EST: DEBUG:  "t": removed 4999999 row versions 
in 22124 pages
2012-01-15 02:11:05.127 EST: DEBUG:  "t": found 4999999 removable, 
5000001 nonremovable row versions in 44248 out of 44248 pages
2012-01-15 02:11:05.127 EST: DETAIL:  0 dead row versions cannot be 
removed yet.
     There were 0 unused item pointers.
     0 pages are entirely empty.
     CPU 0.27s/0.97u sec elapsed 131.73 sec.
2012-01-15 02:11:05.127 EST: LOG:  automatic vacuum of table 
"gsmith.public.t": index scans: 0
     pages: 0 removed, 44248 remain
     tuples: 4999999 removed, 5000001 remain
     buffer usage: 48253 hits, 40296 misses, 43869 dirtied
     avg read rate: 2.390 MiB/s, avg write rate: 2.602 MiB/s
     system usage: CPU 0.27s/0.97u sec elapsed 131.73 sec

I think this new setting suggests the recently adding logging is missing 
a combined I/O figure, something that measures reads + writes over the 
time period.  This is good enough to demonstrate the sort of UI I was 
aiming for in action though.  Administrator says "don't write more than 
4MiB/s", and when autovacuum kicks in it averages 2.4 read + 2.6 write.

I see this change as taking something that feels like black arts tuning 
magic now and turning it into a simple interface that's for the most 
part intuitive.  None of the flexibility is lost here:  you can still 
retune the relative dirty vs. miss vs. hit costs, you have the option of 
reducing the delay to a small value on a busy server where small sleep 
values are possible.  But you don't have to do any of that just to tweak 
autovacuum up or down at a gross level; you can just turn the simple "at 
most this much write I/O" knob instead.

All implementation notes and concerns from here down.

The original cost limit here defaulted to 200 and allowed a range of 1 
to 10000.  The new default of 4000 show these values need to be 20X as 
large.  The maximum was adjusted to 200000 KBps.  Look at that, the 
maximum rate you can run cost delay vacuum at is 200MB/s; there's 
another good example of something that used to be mysterious to compute 
that is obvious now.

I didn't adjust the lower limit downward, so it's actually possible to 
set the new code to only operate at 1/200 the minimum speed you could 
set before.  On the balance this is surely a reduction in foot gun 
aiming though, and I could make the minimum 200 to eliminate it.  Seems 
a needless detail to worry about.

This code is new and just complicated enough that there's surely some 
edge cases I broke here.  In particular I haven't put together a good 
concurrent autovacuum test yet to really prove all the balancing logic 
still works correctly.  Need to test that with a settings change in the 
middle of a long vacuum too.

There's one serious concern I don't have a quick answer to.  What do we 
do with in-place upgrade of relations that specified a custom 
vacuum_cost_limit?  I can easily chew on getting the right logic to 
convert those to equals in the new setting style, but I am not prepared 
to go solely on the hook for all in-place upgrade work one might do 
here.  Would this be easiest to handle as one of those dump/restore 
transformations?  My guess is that's more sensible than the alternative 
of making an on-read converter that only writes in the new format, then 
worrying about upgrading all old pages before moving forward.  While 
this could be an interesting small test case for that sort of thing, I'd 
rather not be patient #1 for that part of the long-term in-place upgrade 
path right now.

Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support

Attachment: vacuum_rate_limit-v1.patch
Description: text/x-patch (22.8 KB)
Attachment: autovac-big.sql
Description: text/x-sql (138 bytes)


pgsql-hackers by date

Next:From: Peter GeogheganDate: 2012-01-15 08:25:46
Subject: Re: Our poll() based WaitLatch implementation is broken
Previous:From: Magnus HaganderDate: 2012-01-15 08:17:54
Subject: Re: automating CF submissions (was xlog location arithmetic)

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group