Skip site navigation (1) Skip section navigation (2)

PageRepairFragmentation performance

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: PageRepairFragmentation performance
Date: 2014-11-18 18:03:10
Message-ID: 546B89DE.7030906@vmware.com (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
When profiling replay the WAL generated by pgbench, I noticed the 
PageRepairFragmentation consumes a large fraction of the CPU time:

Per "perf report":

+   33.44%     6.79%  postmaster  postgres            [.] 
PageRepairFragmentation

The 33.44% figure includes all the functions called by 
PageRepairFragmentation. Looking at the profile closer, most of that 
time seems to be spent in sorting the item ids to physical order, so 
that they can be memmoved in place:

+   83.86%     0.00%  postmaster  libc-2.19.so        [.] __libc_start_main
+   83.86%     0.00%  postmaster  postgres            [.] main
+   83.86%     0.00%  postmaster  postgres            [.] PostmasterMain
+   83.86%     0.00%  postmaster  postgres            [.] 0x000000000023208d
+   83.86%     0.00%  postmaster  postgres            [.] 
AuxiliaryProcessMain
+   83.86%     0.00%  postmaster  postgres            [.] StartupProcessMain
+   83.63%     1.86%  postmaster  postgres            [.] StartupXLOG
+   45.85%     0.10%  postmaster  postgres            [.] heap2_redo
+   33.44%     6.79%  postmaster  postgres            [.] 
PageRepairFragmentation
+   24.60%    16.63%  postmaster  postgres            [.] pg_qsort
+   18.04%     0.23%  postmaster  postgres            [.] heap_redo
+   17.07%     1.53%  postmaster  postgres            [.] 
XLogReadBufferExtended
+   16.20%     0.30%  postmaster  postgres            [.] 
XLogReadBufferForRedoEx
+   14.38%     0.31%  postmaster  postgres            [.] ReadRecord
+   13.90%     1.29%  postmaster  postgres            [.] XLogReadRecord
+   12.40%     1.54%  postmaster  postgres            [.] heap_xlog_update
+   12.08%    12.06%  postmaster  postgres            [.] ValidXLogRecord
+   11.73%     0.10%  postmaster  postgres            [.] 
XLogReadBufferForRedo
+   10.89%     0.27%  postmaster  postgres            [.] 
ReadBufferWithoutRelcac
+    8.49%     1.07%  postmaster  libc-2.19.so        [.] __GI___libc_read
+    7.61%     0.71%  postmaster  postgres            [.] ReadBuffer_common
+    5.64%     0.48%  postmaster  postgres            [.] smgropen
+    5.48%     5.47%  postmaster  postgres            [.] itemoffcompare
+    5.40%     5.38%  postmaster  postgres            [.] 
hash_search_with_hash_v
+    4.70%     4.69%  postmaster  libc-2.19.so        [.] 
__memmove_ssse3_back
+    4.30%     0.77%  postmaster  libc-2.19.so        [.] 
__GI___libc_lseek64
+    4.29%     0.20%  postmaster  postgres            [.] heap_xlog_insert
+    3.88%     3.87%  postmaster  postgres            [.] swapfunc
+    2.81%     0.09%  postmaster  postgres            [.] 
XLogRecordPageWithFreeS
+    2.76%     0.00%          cp  libc-2.19.so        [.] __GI___libc_write
+    2.68%     0.07%  postmaster  postgres            [.] BufTableLookup
+    2.58%     2.58%  postmaster  postgres            [.] LWLockAcquire
+    2.17%     0.14%  postmaster  postgres            [.] tag_hash

So there's clearly some room for improvement here. A couple of ideas:

1. Replace the qsort with something cheaper. The itemid arrays being 
sorted are small, a few hundred item at most, usually even smaller. In 
this pgbench test case I used, the typical size is about 60. With a 
small array a plain insertion sort is cheaper than the generic qsort(), 
because it can avoid the function overhead etc. involved with generic 
qsort. Or we could use something smarter, like a radix sort, knowing 
that we're sorting small integers. Or we could implement an inlineable 
version of qsort and use that.

2. Instead of sorting the array and using memmove in-place, we could 
copy all the tuples to a temporary buffer in arbitrary order, and 
finally copy the temporary copy back to the buffer. That requires two 
memory copies per tuple, instead of one memmove, but memcpy() is pretty 
darn fast. It would be a loss when there are only a few large tuples on 
the page, so that avoiding the sort doesn't help, or when the tuples are 
mostly already in the correct places, so that most of the memmove()s are 
no-ops. But with a lot of small tuples, it would be a win, and it would 
be simple.

The second option would change behaviour slightly, as the tuples would 
be placed on the page in different physical order than before. It 
wouldn't be visible to to users, though.

I spent some time hacking approach 1, and replaced the qsort() call with 
a bucket sort. I'm not sure if a bucket sort is optimal, or better than 
a specialized quicksort implementation, but it seemed simple.

With the testcase I've been using - replaying about 2GB of WAL generated 
by pgbench - this reduces the replay time from about 55 s to 45 s.

Thoughts? Attached is the patch I put together. It's actually two 
patches: the first is just refactoring, putting the common code between 
PageRepairFragmentation, PageIndexMultiDelete, and 
PageIndexDeleteNoCompact to function. The second replaces the qsort().

- Heikki

Attachment: 0001-Refactor.patch
Description: text/x-diff (7.2 KB)
Attachment: 0002-Bucket-sort.patch
Description: text/x-diff (3.6 KB)

Responses

pgsql-hackers by date

Next:From: José Luis TallónDate: 2014-11-18 18:37:32
Subject: Re: PageRepairFragmentation performance
Previous:From: Pavel StehuleDate: 2014-11-18 17:17:13
Subject: Re: proposal: plpgsql - Assert statement

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group