Re: Reducing power consumption on idle servers

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing power consumption on idle servers
Date: 2022-02-21 16:11:14
Message-ID: CANbhV-EYyyOzDcHpZ2tbtGfGT+2wKLa_SXj7gGbeWNFX7EHBPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 19 Feb 2022 at 17:03, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:
> > Some years ago we did a pass through the various worker processes to
> > add hibernation as a mechanism to reduce power consumption on an idle
> > server. Replication never got the memo, so power consumption on an
> > idle server is not very effective on standby or logical subscribers.
> > The code and timing for hibernation is also different for each worker,
> > which is confusing.
>
> Yea, I think it's high time that we fix this.

Good to have your support. There are millions of servers running idle
for some part of their duty cycle.

This patch seeks to change the situation for the better in PG15, i.e.
soon, so the changes proposed are deliberately light. It also seeks to
provide a framework that writers of background worker processes can
follow, since we can't just fix core, we need to fix all the various
bgworkers in use as well.

> IMO we should instead consider either deprecating file based promotion, or
> adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
> that avoid the need to poll for file creation.

Deprecating explicit file-based promotion is possible and simple, so
that is the approach in the latest version of the patch.

Thanks for the suggestion.

I've re-analyzed all of the code around startup/walreceiver
interactions and there isn't any need for 5s delay on startup process,
IMHO, so we can sleep longer, for startup process.

> IMO we should actually consider moving *away* from hibernation for the cases
> where we currently use it and rely much more heavily on being notified when
> there's work to do, without a timeout when not.

I don't think that is a practical approach this close to end of PG15.
This would require changing behavior of bgwriter, walwriter, pgarch
when they are not broken. The likelihood that we would break something
is too high.

What is an issue is that the sleep times of various procs are on
completely different cycles, which is why I am proposing normalizing
them so that Postgres can actually sleep effectively.

> > * autovac launcher - autovacuum_naptime
>
> On production systems autovacuum_naptime often can't be a large value,
> otherwise it's easy to not keep up on small busy tables. That's fine for
> actually busy servers, but with the increase in hosted PG offerings, the
> defaults in those offerings needs to cater to a broader audience.

Autovac varies its wakeup cycle according to how much work is done. It
is OK to set autovacuum_naptime without affecting power consumption
when idle.

Idle for autovac is defined slightly differently, since if all user
work completes then there may still be a lot of vacuuming to do before
it goes fully idle. But my observation is that there are many servers
that go idle for more than 50% of each week, when operating 8-12 hours
per day, 5 days per week, so we can still save a lot of power.

This patch doesn't change how autovac works, it just uses a common
setting for the hibernation that eventually occurs.

> > These servers don't try to hibernate at all:
> > * logical worker - 1s
>
> Not great.

Agreed, the patch improves this, roughly same as walreceiver.

> > * logical launcher - wal_retrieve_retry_interval (undocumented)
>
> I think it's actually 180s in the happy path. The wal_retrieve_retry_interval
> is about how often workers get restarted. But if there's no need to restart,
> it sleeps longer.

I propose normalizing all of the various hibernation times to the same value.

> > * wal_receiver - 100ms, currently gets woken when WAL arrives
>
> This is a fairly insane one. We should compute a precise timeout based on
> wal_receiver_timeout.

That is exactly what the patch does, when it hibernates.

I wasn't aware of Thomas' work, but now that I am we can choose which
of those approaches to use for WAL receiver. I hope that we can fix
logical worker and wal receiver to use the same algorithm. The rest of
this patch would still be valid, whatever we do for those two procs.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachment Content-Type Size
hibernate_to_reduce_power_consumption.v2.patch application/octet-stream 18.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-21 16:17:15 Re: Design of pg_stat_subscription_workers vs pgstats
Previous Message David G. Johnston 2022-02-21 16:07:07 Re: Design of pg_stat_subscription_workers vs pgstats