Quick Links

Excessive PostmasterIsAlive calls slow down WAL redo

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Excessive PostmasterIsAlive calls slow down WAL redo
Date:	2018-04-05 07:23:43
Message-ID:	7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I started looking at the "Improve compactify_tuples and
PageRepairFragmentation" patch, and set up a little performance test of
WAL replay. I ran pgbench, scale 5, to generate about 1 GB of WAL, and
timed how long it takes to replay that WAL. To focus purely on CPU
overhead, I kept the data directory in /dev/shm/.

Profiling that, without any patches applied, I noticed that a lot of
time was spent in read()s on the postmaster-death pipe, i.e. in
PostmasterIsAlive(). We call that between *every* WAL record.

As a quick test to see how much that matters, I commented out the
PostmasterIsAlive() call from HandleStartupProcInterrupts(). On
unpatched master, replaying that 1 GB of WAL takes about 20 seconds on
my laptop. Without the PostmasterIsAlive() call, 17 seconds.

That seems like an utter waste of time. I'm almost inclined to call that
a performance bug. As a straightforward fix, I'd suggest that we call
HandleStartupProcInterrupts() in the WAL redo loop, not on every record,
but only e.g. every 32 records. That would make the main redo loop less
responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
There are also calls to HandleStartupProcInterrupts() in the various
other loops, that wait for new WAL to arrive or recovery delay, so this
would only affect the case where we're actively replaying records.

- Heikki

Attachment	Content-Type	Size
0001-Call-HandleStartupProcInterrupts-less-frequently-in-.patch	text/x-patch	1.6 KB

Responses

Re: Excessive PostmasterIsAlive calls slow down WAL redo at 2018-04-05 13:42:37 from Alvaro Herrera
Re: Excessive PostmasterIsAlive calls slow down WAL redo at 2018-04-05 13:50:24 from Simon Riggs
Re: Excessive PostmasterIsAlive calls slow down WAL redo at 2018-04-05 18:27:58 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2018-04-05 07:31:35	Re: [HACKERS] Add support for tuple routing to foreign partitions
Previous Message	Craig Ringer	2018-04-05 07:09:57	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS