Re: Reducing power consumption on idle servers

From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing power consumption on idle servers
Date: 2022-02-19 17:03:14
Message-ID: 20220219170314.33i3oq2ujsdcnkva@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:
> Some years ago we did a pass through the various worker processes to
> add hibernation as a mechanism to reduce power consumption on an idle
> server. Replication never got the memo, so power consumption on an
> idle server is not very effective on standby or logical subscribers.
> The code and timing for hibernation is also different for each worker,
> which is confusing.

Yea, I think it's high time that we fix this.

It's not even just power consumption:
- the short timeouts hide all kinds of bugs around missed wakeups / racy
wakeup sequences
- debugging problems gets harder if there's lots of frequent activity

> CURRENT STATE
>
> These servers naturally sleep for long periods when inactive:
> * postmaster - 60s
> * checkpointer - checkpoint_timeout
> * syslogger - log_rotation_age
> * pgarch - 60s

Why do we need *any* timeout here? It seems one of those bug/race hiding
things? Imo we should only use a timeout when a prior archive failed and rely
on wakeups otherwise.

> * autovac launcher - autovacuum_naptime

On production systems autovacuum_naptime often can't be a large value,
otherwise it's easy to not keep up on small busy tables. That's fine for
actually busy servers, but with the increase in hosted PG offerings, the
defaults in those offerings needs to cater to a broader audience.

> These servers don't try to hibernate at all:
> * logical worker - 1s

Not great.

> * logical launcher - wal_retrieve_retry_interval (undocumented)

I think it's actually 180s in the happy path. The wal_retrieve_retry_interval
is about how often workers get restarted. But if there's no need to restart,
it sleeps longer.

> * startup - hardcoded 5s when streaming, wal_receiver_retry_interval
> for WAL files

> * wal_receiver - 100ms, currently gets woken when WAL arrives

This is a fairly insane one. We should compute a precise timeout based on
wal_receiver_timeout.

And it's not just one syscall every 100ms, it's

recvfrom(4, 0x7fd66134b960, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
epoll_create1(EPOLL_CLOEXEC) = 6
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593560, u64=140558730322456}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593584, u64=140558730322480}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593608, u64=140558730322504}}) = 0
epoll_wait(6, [], 1, 100) = 0
close(6) = 0

> PROPOSED CHANGES
>
> 1. Standardize the hibernation time at 60s, using a #define
> HIBERNATE_DELAY_SEC 60

I don't think the hibernation stuff is a great pattern. When hibernation was
introduced we neither had latches nor condition variables, so we couldn't
easily do better. Today we have, so we should do better.

We should either not use timeouts at all (e.g. pgarch without a preceding
failure) and rely on being woken up when new work arrives.

Or use precisely calculated timeouts (e.g. NOW() - last_recv_timestamp +
wal_receiver_timeout) when there's a good reason to wake up (like needing to
send a status message).

IMO we should actually consider moving *away* from hibernation for the cases
where we currently use it and rely much more heavily on being notified when
there's work to do, without a timeout when not.

> 5. Startup process has a hardcoded 5s loop because it checks for
> trigger file to promote it. So hibernating would mean that it would
> promote more slowly, and/or restart failing walreceiver more slowly,
> so this requires user approval, and hence add new GUCs to approve that
> choice. This is a valid choice because a long-term idle server is
> obviously not in current use, so waiting 60s for failover or restart
> is very unlikely to cause significant issue.

There's plenty of databases that are close to read-only but business critical,
so I don't buy that argument.

IMO we should instead consider either deprecating file based promotion, or
adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
that avoid the need to poll for file creation.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-02-19 17:04:57 Re: Design of pg_stat_subscription_workers vs pgstats
Previous Message Tom Lane 2022-02-19 17:00:01 Re: Fix formatting of Interval output