Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Date: 2012-03-14 20:52:38
Message-ID: 4F610516.9040102@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.03.2012 21:33, I wrote:
> The slowdown with > 6 clients seems to be spinlock contention. I ran
> "perf record" for a short duration during one of the ramdrive tests, and
> saw the spinlock acquisition in ReserveXLogInsertLocation() consuming
> about 80% of all CPU time.
>
> I then hacked the patch a little bit, removing the check in XLogInsert
> for fullPageWrites and forcePageWrites, as well as the check for "did a
> checkpoint just happen" (see
> http://community.enterprisedb.com/xloginsert-scale-tests/disable-fpwcheck.patch).
> My hunch was that accessing those fields causes cache line stealing,
> making the cache line containing the spinlock even more busy. That hunch
> seems to be correct; when I reran the tests with that patch, the
> performance with high # of clients became much better. See the results
> with "xloginsert-scale-13.patch". With that change, the single-client
> case is still about 10% slower than current code, but the performance
> with > 8 clients is almost as good as with current code. Between 2-6
> clients, the patch is a win.
>
> The hack that restored the > 6 clients performance to current level is
> not safe, of course, so I'll have to figure out a safe way to get that
> effect.

I managed to do that in a safe way, and also found a couple of other
small changes that made a big difference to performance. I found out
that the number of cache misses while holding the spinlock matter a lot,
which in hindsight isn't surprising. I aligned the XLogCtlInsert struct
on a 64-byte boundary, so that the new spinlock and the fields it
protects all fit on the same cache line (on boxes with cache line size
>= 64 bytes, anyway). I also changed the logic of the insertion slots
slightly, so that when a slot is reserved, while holding the spinlock,
it doesn't need to be immediately updated. That avoids one cache miss,
as the cache line holding the slot doesn't need to be accessed while
holding the spinlock. And to reduce contention on cache lines when an
insertion is finished and the insertion slot is updated, I shuffled the
slots so that slots that are logically adjacent are spaced apart in memory.

When all those changes are put together, the patched version now beats
or matches the current code in the RAM drive tests, except that the
single-client case is still about 10% slower. I added the new test
results at http://community.enterprisedb.com/xloginsert-scale-tests/,
and a new version of the patch is attached.

If all of this sounds pessimistic, let me remind that I've been testing
the cases where I'm seeing regressions, so that I can fix them, and not
trying to demonstrate how good this is in the best case. These tests
have been with very small WAL records, with only 16 bytes of payload.
Larger WAL records benefit more. I also ran one test with larger, 100
byte WAL records, and put the results up on that site.

> Also, even when the performance is as good as current code, it's
> not good to spend all the CPU time spinning on the spinlock. I didn't
> measure the CPU usage with current code, but I would expect it to be
> sleeping, not spinning, when not doing useful work.

This is still an issue.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
xloginsert-scale-18.patch text/x-diff 100.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ktm@rice.edu 2012-03-14 20:53:45 Re: Faster compression, again
Previous Message Robert Haas 2012-03-14 20:52:30 Re: wal_buffers, redux