Re: Vacuum rate limit in KBps

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum rate limit in KBps
Date: 2012-01-19 18:10:39
Message-ID: CA+Tgmobt02R45oCc5YCLkqhrhe+-C8rTgK-zx5F0wkR8pZxTKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 15, 2012 at 4:17 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> +1. I've been thinking we should do that for a long time, but haven't gotten
> around to it.
>
> I think it makes more sense to use the max read rate as the main knob,
> rather than write rate. That's because the max read rate is higher than the
> write rate, when you don't need to dirty pages. Or do you think saturating
> the I/O system with writes is so much bigger a problem than read I/O that it
> makes more sense to emphasize the writes?
>
> I was thinking of something like this, in postgresql.conf:
>
> # - Vacuum Throttling -
>
> #vacuum_cost_page_miss = 1.0            # measured on an arbitrary scale
> #vacuum_cost_page_dirty = 2.0           # same scale as above
> #vacuum_cost_page_hit = 0.1             # same scale as above
> #vacuum_rate_limit = 8MB                # max reads per second
>
> This is now similar to the cost settings for the planner, which is good.

I have to say that I find that intensely counterintuitive. The
current settings are not entirely easy to tune correctly, but at least
they're easy to explain. What does that 8MB mean and how does it
relate to vacuum_cost_page_miss? If I double vacuum_rate_page_miss,
does that effectively also double the cost limit, so that dirty pages
and hits become relatively cheaper? If so, then I think what that
really means is that the limit is 8MB only if there are no hits and no
dirtied pages - otherwise it's less, and the amount by which it is
less is the result of some arcane calculation. Ugh!

I can really imagine people wanting to limit two things here: either
they want to limit the amount of read I/O, or they want to limit the
amount of write I/O. If your database fits in physical memory you
probably don't care about the cost of page misses very much at all,
but you probably do care about how much data you dirty. OTOH, if your
database doesn't fit in physical memory and you have a relatively
small percentage of dirty pages because the tables are lightly
updated, dirtying might be pretty secondary; if you care at all, it's
going to be because busying the disk head with large sequential reads
eats up too much of the system's I/O capacity. If we added
vacuum_read_rate_limit and vacuum_dirty_rate_limit, totally
independently of each other, and through the current system where
those two things get mixed together in one big bucket out the window
completely, I could maybe sign onto that as an improvement to the UI.

But even then, I think we need to balance the amount of the gain
against the backward compatibility problems we're going to create. If
we start removing autovacuum options, then, as Greg notes, we have to
figure out how to make old pg_dumps load into new databases and
hopefully do something close to what the DBA intended. And the DBA
will have to learn the new system. I'm not sure we're really going to
get enough mileage out changing this to justify the hassle. It's
basically a cosmetic improvement, and I think we should be careful
about breaking compatibility for cosmetic improvements, especially at
the end of a release cycle when we're under time pressure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-01-19 18:14:02 Re: automating CF submissions (was xlog location arithmetic)
Previous Message Simon Riggs 2012-01-19 18:02:24 Re: Simulating Clog Contention