Re: Vacuum rate limit in KBps

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum rate limit in KBps
Date: 2012-01-22 23:47:17
Message-ID: 4F1CA005.1040001@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim Nasby wrote:
> Your two comments together made me realize something... at the end of the day people don't care about MB/s. They care about impact to other read and write activity in the database.
>
> What would be interesting is if we could monitor how long all *foreground* IO requests took. If they start exceeding some number, that means the system is at or near full capacity, and we'd like background stuff to slow down.
>

My hope for 9.2 was to get VACUUM moved over into some human-readable
units. Having the whole thing work only via these abstract cost units
is driving most of my customers with larger databases crazy. The patch
I suggested was the easiest refactoring I thought moved in the right
direction. While it may not be the perfect thing to care about, the
very positive reaction I've gotten to the already landed patch to log in
MB/s has suggested to me people are a lot more comfortable with that
than the cost limit numbers.

For 9.3, this whole mess needs to become integrated with a full-system
monitoring approach, to really solve this well. pg_stat_bgwriter knows
how many writes are coming from the various parts of the system, the
total amount of write I/O. Given that, I can turn VACUUM completely
dynamic based on what else is happening in many common situations. The
sort of end goal I was thinking about was be able to say something like
"let VACUUM use up to 4MB/s on writes, but subtract off the average
write level of everything else". Now it's a background process running
only when there's capacity to spare for it. You could turn it up a lot
higher, if you knew it was only going to run at that level when the
system wasn't as busy. That's one reason I started by suggesting a
write-based limit; it fit into that longer-range plan better. Maybe
that idea is junk and focusing on actual read I/O is the real problem
with VACUUM for most people. I can tell you once I get more data out of
systems that are logging in MB/s.

If instead or in addition we get some better field data on systems that
can afford to time a lot more things, and then start building feedback
limiters based on how long all sorts of operations take to occur, that's
a whole different parallel approach for auto-tuning this. I haven't
thought about that as much simply because it only just became clear
recently when the timing data is cheap to collect. I need to get a lot
more production server data about that overhead to work with here too.

> Dealing with SSDs vs real media would be a bit challenging... though, I think it would only be an issue if the two were randomly mixed together. Kept separately I would expect them to have distinct behavior patterns that could be measured and identified

This might just turn into another one of those things where we will
eventually need to have some more information on a per-tablespace
basis. I envision allowing the server to collect more timing data as
being something you can turn on for a bit, let it populate statistics
about just what fast or slow means for each tablespace. Then you can
keep those results around to guide future decisions even after timing is
turned off. Maybe toggle it back on a day a month to make sure the
numbers are still sane, if it's too expensive to time things every day.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-23 02:06:39 Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Previous Message Greg Smith 2012-01-22 23:14:28 Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter