Re: Should we increase the default vacuum_cost_limit?

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Jeremy Schneider <schnjere(at)amazon(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should we increase the default vacuum_cost_limit?
Date: 2019-03-09 09:28:12
Message-ID: CAKJS1f8s5Cxat65_-uE97oUpoonO3p_2rFMf-cKA7kpAaS1-OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 9 Mar 2019 at 16:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I propose therefore that instead of increasing vacuum_cost_limit,
> what we ought to be doing is reducing vacuum_cost_delay by a similar
> factor. And, to provide some daylight for people to reduce it even
> more, we ought to arrange for it to be specifiable in microseconds
> not milliseconds. There's no GUC_UNIT_US right now, but it's time.
> (Perhaps we should also look into using other delay APIs, such as
> nanosleep(2), where available.)

It does seem like a genuine concern that there might be too much all
or nothing. It's no good being on a highspeed train if it stops at
every platform.

I agree that vacuum_cost_delay might not be granular enough, however.
If we're going to change the vacuum_cost_delay into microseconds, then
I'm a little concerned that it'll silently break existing code that
sets it. Scripts that do manual off-peak vacuums are pretty common
out in the wild.

In an ideal world we'd just redesign the vacuum throttling to have
MB/s for hit/read/dirty, and possible also WAL write rate. I'm not
sure exactly how they'd cooperate together, but we could likely
minimise gettimeofday() calls by sampling the time it took to process
N pages, and if N pages didn't take the time we wanted them to take we
could set N = Min(N * ($target_gettimeofday_sample_rate / $timetaken),
1); e.g if N was 2000 and it just took us 1 second to do 2000 pages,
but we want to sleep every millisecond, then just do N *= (0.001 / 1),
so the next run we only do 2 pages before checking how long we should
sleep for. If we happened to process those 2 pages in 0.5
milliseconds, then N would become 4, etc.

We'd just need to hard code the $target_gettimeofday_sample_rate.
Probably 1 millisecond would be about right and we'd need to just
guess the first value of N, but if we guess a low value, it'll be
quick to correct itself after the first batch of pages.

If anyone thinks that idea has any potential, then maybe it's better
to leave the new vacuum_cost_limit default in place and consider
redesigning this for PG13... as such a change is too late for PG12.

It may also be possible to make this a vacuum rate limit in %. Say 10%
would just sleep for 10x as long is it took to process the last set of
pages. The problem with this is that if the server was under heavy
load then auto-vacuum might crawl along, but that might be the exact
opposite of what's required as it might be crawling due to inadequate
vacuuming.

> I don't have any particular objection to kicking up the maximum
> value of vacuum_cost_limit by 10X or so, if anyone's hot to do that.
> But that's not where we ought to be focusing our concern. And there
> really is a good reason, not just nannyism, not to make that
> setting huge --- it's just the wrong thing to do, as compared to
> reducing vacuum_cost_delay.

My vote is to 10x the maximum for vacuum_cost_limit and consider
changing how it all works in PG13. If nothing happens before this
time next year then we can consider making vacuum_cost_delay a
microseconds GUC.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-03-09 10:35:53 RE: Timeout parameters
Previous Message Julien Rouhaud 2019-03-09 08:34:32 Re: Checksum errors in pg_stat_database