Re: Moving more work outside WALInsertLock

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Moving more work outside WALInsertLock
Date: 2011-12-23 08:13:43
Message-ID: 4EF43837.8040306@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.12.2011 15:42, Heikki Linnakangas wrote:
> On 16.12.2011 15:03, Simon Riggs wrote:
>> On Fri, Dec 16, 2011 at 12:50 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> On 16.12.2011 14:37, Simon Riggs wrote:
>>>>
>>>> I already proposed a design for that using page-level share locks any
>>>> reason not to go with that?
>>>
>>> Sorry, I must've missed that. Got a link?
>>
>> From nearly 4 years ago.
>>
>> http://grokbase.com/t/postgresql.org/pgsql-hackers/2008/02/reworking-wal-locking/145qrhllcqeqlfzntvn7kjefijey
>>
>
> Ah, thanks. That is similar to what I'm experimenting, but a second
> lwlock is still fairly heavy-weight. I think with many backends, you
> will be beaten badly by contention on the spinlocks alone.
>
> I'll polish up and post what I've been experimenting with, so we can
> discuss that.

So, here's a WIP patch of what I've been working on. The WAL insertions
is split into two stages:

1. Reserve the space from the WAL stream. This is done while holding a
spinlock. The page holding the reserved space doesn't necessary need to
be in cache yet, the reservation can run ahead of the WAL buffer cache.
(quick testing suggests that a lwlock is too heavy-weight for this)

2. Ensure the page is in the WAL buffer cache. If not, initialize it,
evicting old pages if needed. Then finish the CRC calculation of the
header and memcpy the record in place. (if the record spans multiple
pages, it operates on one page at a time, to avoid problems with running
out of WAL buffers)

As long as wal_buffers is high enough, and the I/O can keep up, stage 2
can happen in parallel in many backends. The WAL writer process
pre-initializes new pages ahead of the insertions, so regular backends
rarely need to do that.

When a page is written out, with XLogWrite(), you need to wait for any
in-progress insertions to the pages you're about to write out to finish.
For that, every backend has slot with an XLogRecPtr in shared memory.
Iẗ́'s set to the position where that backend is currently inserting to.
If there's no insertion in-progress, it's invalid, but when it's valid
it acts like a barrier, so that no-one is allowed to XLogWrite() beyond
that position. That's very lightweight to the backends, but I'm using
busy-waiting to wait on an insertion to finish ATM. That should be
replaced with something smarter, that's the biggest missing part of the
patch.

One simple way to test the performance impact of this is:

psql -c "DROP TABLE IF EXISTS foo; CREATE TABLE foo (id int4);
CHECKPOINT" postgres
echo "BEGIN; INSERT INTO foo SELECT i FROM generate_series(1, 10000) i;
ROLLBACK" > parallel-insert-test.sql
pgbench -n -T 10 -c4 -f parallel-insert-test.sql postgres

On my dual-core laptop, this patch increases the tps on that from about
60 to 110.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-12-23 08:15:01 Re: Moving more work outside WALInsertLock
Previous Message Kyotaro HORIGUCHI 2011-12-23 07:38:28 Re: Allow substitute allocators for PGresult.