Re: Hot standby v5 patch assertion failure

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby v5 patch assertion failure
Date: 2008-11-06 17:58:42
Message-ID: 1225994322.27904.17.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote:
> Trying out a few different scenarios I ran across this:
>
> 1/ Setup master and replica with replica using pg_standby
> 2/ Create a new database ("bench" in my case)
> 3/ Initialize pgbench schema size 100
> 4/ Run with 2 clients and 10000 transactions
> 5/ Replica gets assertion failure

I've been unable to reproduce this error in more than 2 days of bashing.
The bash test I use is a pgbench variant designed to cause write
contention, while at the same time running reads against those same
blocks on standby, plus running parallel installcheck.

I suspect now there was a problem in ProcArrayClearUnobservedXids(), so
I clear the array each time now, whether or not we are in assert mode.
i.e. better hygiene around reused data structures. So I *haven't*
reworked my earlier code, just checked it all again.

So, new patch enclosed. This fixes everything reported so far, plus
another 2 bugs I found and fixed during re-test.

Outstanding items currently:
* btree VACUUM code - only partially coded so far
* complete prepared xacts support
* README file update with tech overview of the patch
* hash index handling (or any index type that doesn't write WAL)
I expect to roll out v6 with the first three of these items next week.

So this patch has 100% of what is intended for "infrastructure changes
for recovery".

I've also profiled the startup process. Those tests show that the
overhead of Hot Standby code to normal archive recovery is around 4% CPU
increase on a test of pgbench -c 4. None of the functions added for Hot
Standby is in the top 10 profiled. Given that recovery is I/O bound this
probably means no decrease in performance, so other results welcome.
[Profile results at bottom]

Startup process no longer performs restartpoints, so overall this will
be faster than 8.3, even before we consider Koichi's tuning patch.

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
10.34 0.03 0.03 138436 0.00 0.00 hash_search_with_hash_value
10.34 0.06 0.03 31559 0.00 0.00 PageAddItem
10.34 0.09 0.03 1 30.00 229.58 StartupXLOG
6.90 0.11 0.02 19466 0.00 0.01 heap_xlog_update
6.90 0.13 0.02 smgrwrite
3.45 0.14 0.01 272400 0.00 0.00 transtime
3.45 0.15 0.01 126284 0.00 0.00 hash_any
3.45 0.16 0.01 84249 0.00 0.00 hash_search
3.45 0.17 0.01 84039 0.00 0.00 smgropen
3.45 0.18 0.01 42180 0.00 0.00 smgrnblocks
3.45 0.19 0.01 42019 0.00 0.00 ReadBuffer_common
3.45 0.20 0.01 42015 0.00 0.00 XLogReadBufferExtended
3.45 0.21 0.01 37772 0.00 0.00 timesub
3.45 0.22 0.01 11972 0.00 0.00 PageHeaderIsValid
3.45 0.23 0.01 11972 0.00 0.00 mdread
3.45 0.24 0.01 11626 0.00 0.00 Insert
3.45 0.25 0.01 6480 0.00 0.00 RecordKnownAssignedTransactionIds (*)
3.45 0.26 0.01 1121 0.01 0.01 tzload
3.45 0.27 0.01 812 0.01 0.01 heap_page_prune_execute
3.45 0.28 0.01 8 1.25 1.25 element_alloc
1.72 0.29 0.01 46010 0.00 0.00 TransactionIdFollowsOrEquals
1.72 0.29 0.01 TransactionIdFollows
0.00 0.29 0.00 410293 0.00 0.00 itemoffcompare
0.00 0.29 0.00 200330 0.00 0.00 leaps_thru_end_of
0.00 0.29 0.00 197907 0.00 0.00 detzcode
0.00 0.29 0.00 177950 0.00 0.00 detzcode64
0.00 0.29 0.00 140873 0.00 0.00 LWLockAcquire
0.00 0.29 0.00 140873 0.00 0.00 LWLockRelease
0.00 0.29 0.00 126255 0.00 0.00 tag_hash
0.00 0.29 0.00 114891 0.00 0.00 swapfunc
0.00 0.29 0.00 100394 0.00 0.00 increment_overflow

(*) newly added by this patch

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
hot_standby.v5d.patch.bz2 application/x-bzip 65.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-11-06 18:07:35 Re: broken URL in commitfest page
Previous Message Andrew Dunstan 2008-11-06 17:56:35 Re: plperl needs upgrade for Fedora 10