Reduced power consumption in autovacuum launcher process

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Reduced power consumption in autovacuum launcher process
Date: 2011-07-18 13:12:20
Message-ID: CAEYLb_XsGWEAtsE-3eDWnPt2kX8tzBU7KE3suhoD-EUwY3yfpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 July 2011 01:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Might be good to start a new thread for each auxilliary process, or we
> may get confused.

Right - I've done so in this reply. There isn't a problem as such with
the AV Launcher patch - there's a problem with most or all remaining
auxiliary processes, that needs to be worked out once and for all. I
refer to the generic signal handler/timeout invalidation issues
already described.

>> ISTM that these generic handlers ought to be handling this
>> generically, and that there should be a Latch for just this purpose
>> for each process within PGPROC. We already have this Latch in PGPROC:
>>
>> Latch           waitLatch;              /* allow us to wait for sync rep */
>>
>> Maybe its purpose should be expanded to "current process Latch"?
>
> I think that could be a really good idea, though I haven't looked at
> it closely enough to be sure.
>
>> Another concern is, what happens when we receive a signal, generically
>> handled or otherwise, and have to SetLatch() to avoid time-out
>> invalidation? Should we just live with a spurious
>> AutoVacLauncherMain() iteration, or should we do something like check
>> if the return value of WaitLatch indicates that we woke up due to a
>> SetLatch() call, which must have been within a singal handler, and
>> that we should therefore goto just before WaitLatch() and elide the
>> spurious iteration? Given that we can expect some signals to occur
>> relatively frequently, spurious iterations could be a real concern.
>
> Really?  I suspect that it doesn't much matter exactly how many
> machine language instructions we execute on each wake-up, within
> reasonable bounds, of course.  Maybe some testing is in order?

There's only one way to get around the time-out invalidation problem
that I'm aware of - call SetLatch() in the handler. I'd be happy to
hear alternatives, but until we have an alternative, we're stuck
managing this in each and every signal handler.

Once we've had the latch set to handle this, and control returns to
the auxiliary process loop, we now have to decide from within the
auxiliary if we can figure out that all that happened was a "required"
wake-up, and thus we shouldn't really go through with another
iteration. That, or we can simply do the iteration.

I have my doubts that it is acceptable to wake-up spuriously in
response to routine events that there are generic handlers for. Maybe
this needs to be decided on a case-by-case basis.

> On another note, I might be inclined to write something like:
>
> if ((return_value_of_waitlatch & WL_POSTMASTER_DEATH) && !PostmasterIsAlive())
>   proc_exit(1);
>
> ...so as to avoid calling that function unnecessarily on every iteration.

Hmm. I'm not so sure. We're now relying on the return value of
WaitLatch(), which isn't guaranteed to report all wake-up events
(although I don't believe it would be a problem in this exact case).
Previously, we called PostmasterIsAlive() once a second anyway, and
that wasn't much of a problem.

>> Incidentally, should I worry about the timeout long for WaitLatch()
>> overflowing?
>
> How would that happen?

struct timeval is comprised of two longs - one representing seconds,
and the other represented the balance of microseconds. Previously, we
didn't combine them into a single microsecond representation. Now, we
do.

There could perhaps be a very large "nap", as determined by
launcher_determine_sleep(), so that the total number of microseconds
passed to WaitLatch() would exceed the maximum long size that can be
safely represented on some or all platforms. On most 32-bit machines,
sizeof(long) == sizeof(int), which is just 4 bytes. (2^31) - 1 =
2,147,483,647 microseconds = only about 35 minutes. There are corner
cases, such as if someone were to set autovacuum_naptime to something
silly.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-07-18 13:19:44 Re: Reduced power consumption in autovacuum launcher process
Previous Message Peter Eisentraut 2011-07-18 10:52:50 Re: Re: [COMMITTERS] pgsql: Add temp_file_limit GUC parameter to constrain temporary file sp