Re: PageRepairFragmentation performance

From: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PageRepairFragmentation performance
Date: 2014-11-18 18:37:32
Message-ID: 546B91EC.7070500@adv-solutions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/18/2014 07:03 PM, Heikki Linnakangas wrote:
> When profiling replay the WAL generated by pgbench, I noticed the
> PageRepairFragmentation consumes a large fraction of the CPU time:
> [snip]
>
> 1. Replace the qsort with something cheaper. The itemid arrays being
> sorted are small, a few hundred item at most, usually even smaller. In
> this pgbench test case I used, the typical size is about 60. With a
> small array a plain insertion sort is cheaper than the generic
> qsort(), because it can avoid the function overhead etc. involved with
> generic qsort. Or we could use something smarter, like a radix sort,
> knowing that we're sorting small integers. Or we could implement an
> inlineable version of qsort and use that.
IIRC, we would have a theoretical complexity of quicksort and radix sort
should be approximately the same for 256-1024 items... O(n*log(n)) vs
O(d*n), where d is ~log2(n) or just 16 in this case. However,
lexicographical ("bitstring-wise" ordering) might not be what we are
aiming for here

AFAIK, an inlined quicksort should be about the best performing sort
available (most of the enhancement coming from staying within the I-cache)

> 2. Instead of sorting the array and using memmove in-place, we could
> copy all the tuples to a temporary buffer in arbitrary order, and
> finally copy the temporary copy back to the buffer. That requires two
> memory copies per tuple, instead of one memmove, but memcpy() is
> pretty darn fast. It would be a loss when there are only a few large
> tuples on the page, so that avoiding the sort doesn't help, or when
> the tuples are mostly already in the correct places, so that most of
> the memmove()s are no-ops. But with a lot of small tuples, it would be
> a win, and it would be simple.

Memmove *should* be no slower than memcpy.... if both are actually
translated by the compiler to use intrinsics as opposed to calling the
functions --- as it seems to be done here (cfr. __memmove_ssse3_back )
A simple "if" in order to eliminate the no-op memmoves might as well do
it, too.

Just my two (euro) cents, though

> The second option would change behaviour slightly, as the tuples would
> be placed on the page in different physical order than before. It
> wouldn't be visible to to users, though.
>
> I spent some time hacking approach 1, and replaced the qsort() call
> with a bucket sort. I'm not sure if a bucket sort is optimal, or
> better than a specialized quicksort implementation, but it seemed simple.
>
> With the testcase I've been using - replaying about 2GB of WAL
> generated by pgbench - this reduces the replay time from about 55 s to
> 45 s.

Not bad at all... though I suspect most of it might come from staying
within the I-cache as opposed to regular qsort.
The smaller itemIdSortData structure surely helps a bit, too :)

> Thoughts? Attached is the patch I put together. It's actually two
> patches: the first is just refactoring, putting the common code
> between PageRepairFragmentation, PageIndexMultiDelete, and
> PageIndexDeleteNoCompact to function. The second replaces the qsort().

Definitively worth-while, even if just for the refactor. The speed-up
sounds very good, too.

Thanks,

/ J.L.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-11-18 18:42:58 pg_locks doesn't check for interrupts?
Previous Message Heikki Linnakangas 2014-11-18 18:03:10 PageRepairFragmentation performance