Bug: walsender and high CPU usage

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Bug: walsender and high CPU usage
Date: 2012-03-09 11:40:20
Message-ID: CAHGQGwG=13nAEsVEO+2WpzyyuXyhB-cQE8BSTG6D7R_vMKDytA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I found the bug which causes walsender to enter into busy loop
when replication connection is terminated. Walsender consumes
lots of CPU resource (%sys), and this situation lasts until it has
detected the termination of replication connection and exited.

The cause of this bug is that the walsender loop doesn't call
ResetLatch at all in the above case. Since the latch remains set,
the walsender loop cannot sleep on the latch, i.e., WaitLatch
always returns immediately.

We can fix this bug by adding ResetLatch into the top of the
walsender loop. Patch attached.

This bug exists in 9.1 but not in 9.2dev. In 9.2dev, this bug has
already been fixed by the commit
(cff75130b5f63e45423c2ed90d6f2e84c21ef840). This commit
refactors and refines the walsender loop logic in addition to
adding ResetLatch. So I'm tempted to backport this commit
(except the deletion of wal_sender_delay) to 9.1 rather than
applying the attached patch. OTOH, attached patch is quite simple,
and its impact on 9.1 would be very small, so it's easy to backport that.
Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
bugfix_v1.patch text/x-diff 514 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2012-03-09 11:53:38 Re: Command Triggers, patch v11
Previous Message Marco Nenciarini 2012-03-09 11:24:33 Re: [PATCH] Support for foreign keys with arrays