Process wakeups when idle and power consumption

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Process wakeups when idle and power consumption
Date: 2011-05-05 19:49:25
Message-ID: BANLkTimhHgF=xQ3yMjxCffD9DQLGyWeegg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There is a general need to have Postgres consume fewer CPU cycles and
less power when idle. Until something is done about this, shared
hosting providers, particularly those who want to deploy many VM
instances with databases, will continue to choose MySQL out of hand.

I have quantified the difference in the number of wake-ups when idle
between Postgres and MySQL using Intel's powertop utility on my
laptop, which runs Fedora 14. These figures are for a freshly initdb'd
database from git master, and mysql-server 5.1.56 from my system's
package manager.

*snip*
2.7% ( 11.5) [ ] postgres
1.1% ( 4.6) [ 1663] Xorg
0.9% ( 3.7) [ 1463] wpa_supplicant
0.6% ( 2.7) [ ] [ahci] <interrupt>
0.5% ( 2.2) [ ] mysqld
*snip*

Postgres consistenly has 11.5 wakeups per second, while MySQL
consistently has 2.2 wakeups (averaged over the 5 second period that
each cycle of instrumentation lasts).

If I turn on archiving, the figure for Postgres naturally increases:

*snip*
1.7% ( 12.5) [ ] postgres
1.6% ( 12.0) [ 808] phy0
0.7% ( 5.4) [ 1463] wpa_supplicant
0.6% ( 4.3) [ ] [ahci] <interrupt>
0.3% ( 2.2) [ ] mysqld
*snip*

It increases by exactly the amount that you'd expect after looking at
pgarch.c - one wakeup per second. This is because there is a loop
within the main event loop for the process that is a prime example of
what unix_latch.c describes as "the common pattern of using
pg_usleep() or select() to wait until a signal arrives, where the
signal handler sets a global variable". The loop naps for one second
per iteration.

Attached is the first in what I hope will become a series of patches
for reducing power consumption when idle. It makes the archiver
process wake far less frequently, using a latch primitive,
specifically a non-shared latch. I'm not sure if I should have used a
shared latch, and have SetLatch() calls replace
SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) calls. Would that have
broken some implied notion of encapsulation? In any case, if I apply
the patch and rebuild, the difference is quite apparent:

***snip***
3.9% ( 21.8) [ 1663] Xorg
3.2% ( 17.9) [ ] [ath9k] <interrupt>
2.1% ( 11.9) [ 808] phy0
2.1% ( 11.5) [ ] postgres
1.0% ( 5.4) [ 1463] wpa_supplicant
0.4% ( 2.2) [ ] mysqld
***snip***

The difference from not running the archiver at all appears to have
been completely eliminated (in fact, we still wake up every
PGARCH_AUTOWAKE_INTERVAL seconds, which is 60 seconds, but that
usually isn't apparent to powertop, which measures wakeups over 5
second periods).

If we could gain similar decreases in idle power consumption across
all Postgres ancillary processes, perhaps we'd see Postgres available
as an option for shared hosting plans more frequently. When these
differences are multiplied by thousands of VM instances, they really
matter. Unfortunately, there doesn't seem to be a way to get powertop
to display its instrumentation per-process to quickly get a detailed
overview of where those wake-ups occur across all pg processes.

I hope to work on reducing wakeups for PG ancillary processes in this
order (order of perceived difficulty), using shared latches to
eliminate "the waiting pattern" in each case:

* WALWriter
* BgWriter
* WALReceiver
* Startup process

I'll need to take a look at statistics, autovacuum and Logger
processes too, to see if they present more subtle opportunities for
reduced idle power consumption.

Do constants like PGARCH_AUTOWAKE_INTERVAL need to always be set at
their current, conservative levels? Perhaps these sorts of values
could be collectively controlled with a single GUC that represents a
trade-off between CPU cycles used when idle against
safety/reliability. On the other hand, there are GUCs that control
that per process in some cases already, such as wal_writer_delay, and
that suggestion could well be a bit woolly. It might be an enum value
that represented various levels of concern that would default to
something like 'conservative' (i.e. the current values).

Thoughts?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
archiver_power.patch text/x-patch 4.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-05-05 19:51:34 Re: new clang report
Previous Message Magnus Hagander 2011-05-05 19:48:14 Re: FDW table hints