Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Date: 2012-03-21 11:52:57
Message-ID: 4F69C119.9050301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.03.2012 13:14, Fujii Masao wrote:
> PANIC: space reserved for WAL record does not match what was
> written, CurrPos: C/0, EndPos: B/FF000000
>
> So I think that the patch would have a bug which handles WAL boundary wrongly.

Thanks for the testing! These WAL boundary issues are really tricky, you
found bugs in that area before, and I found and fixed one before posting
the last version, and apparently there's still at least one left.

Overall, what do you (and others) think about the state of this patch?
I'm starting to feel that this needs to be pushed to 9.3. That bug might
not be very hard to fix, but the fact that these bugs are still cropping
up at this late stage makes me uneasy. That boundary calculation in
particular is surprisingly tricky, and I think it could be made less
tricky with some refactoring of the WAL-logging code, replacing
XLogRecPtr with uint64s, like Peter (IIRC) suggested a while ago. And
that seems like 9.3 material. Also, there's still these two known issues:

1. It slows down the WAL insertion in a single backend by about 10%
2. With lots of backends inserting tiny records concurrently, you get
spinlock contention, which consumes a lot of CPU. Without the patch, you
get lwlock contention and bandwidth isn't any better, but you sleep
rather than spin.

I'm afraid those issues aren't easily fixable. I haven't been able to
identify the source of slowdown in the single-backend case, it seems to
be simply the distributed cost of the extra bookkeeping. That might be
acceptable, 10% slowdown of raw WAL insertion speed is not good, but WAL
insertion only accounts for a fraction of the total CPU usage for any
real workload, so I believe the slowdown of a real application would be
more like 1-3%, at worst. But I would feel more comfortable if we had
more time to test that.

The spinlock contention issue might be acceptable too. I think it would
be hard to run into it in a real application, and even then, the
benchmarks show that although you spend a lot of CPU time spinning, you
get at least the same overall bandwidth with the patch, which is what
really matters. And I think it could be alleviated by reducing the time
the spinlock is held, and I think that could be done by making the space
reservation calculations simpler. If we got rid of the limitation that
the WAL record header is never split across WAL pages, and always stored
the continuation record header on all WAL pages, the space reservation
calculation could be reduced to essentially "currentpos += size". But
that again seems 9.3 material.

So, although none of the issues alone is a show-stopper, but considering
all these things together, I'm starting to feel that this needs to be
pushed to 9.3. Thoughts?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-21 12:24:58 Re: Proposal: Create index on foreign table
Previous Message Fujii Masao 2012-03-21 11:14:28 Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)