Re: Small improvement to compactify_tuples

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Small improvement to compactify_tuples
Date: 2017-11-03 19:30:25
Message-ID: 11367.1509737425@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
> On Thu, Nov 2, 2017 at 11:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, the originally given test case shows no measurable improvement
>> on my box.

> I did manage to reproduce the original test and got a consistent improvement.

It occurred to me that I could force the issue by hacking bufpage.c to
execute compactify_tuples multiple times each time it was called, as in
the first patch attached below. This has nothing directly to do with real
performance of course, but it's making use of the PG system to provide
realistic test data for microbenchmarking compactify_tuples. I was a bit
surprised to find that I had to set the repeat count to 1000 to make
compactify_tuples really dominate the runtime (while using the originally
posted test case ... maybe there's a better one?). But once I did get it
to dominate the runtime, perf gave me this for the CPU hotspots:

+ 27.97% 27.88% 229040 postmaster libc-2.12.so [.] memmove
+ 14.61% 14.57% 119704 postmaster postgres [.] compactify_tuples
+ 12.40% 12.37% 101566 postmaster libc-2.12.so [.] _wordcopy_bwd_aligned
+ 11.68% 11.65% 95685 postmaster libc-2.12.so [.] _wordcopy_fwd_aligned
+ 7.67% 7.64% 62801 postmaster postgres [.] itemoffcompare
+ 7.00% 6.98% 57303 postmaster postgres [.] compactify_tuples_loop
+ 4.53% 4.52% 37111 postmaster postgres [.] pg_qsort
+ 1.71% 1.70% 13992 postmaster libc-2.12.so [.] memcpy

which says that micro-optimizing the sort step is a complete, utter waste
of time, and what we need to be worried about is the data copying part.

The memcpy part of the above is presumably from the scaffolding memcpy's
in compactify_tuples_loop, which is interesting because that's moving as
much data as the memmove's are. So at least with RHEL6's version of
glibc, memmove is apparently a lot slower than memcpy.

This gave me the idea to memcpy the page into some workspace and then use
memcpy, not memmove, to put the tuples back into the caller's copy of the
page. That gave me about a 50% improvement in observed TPS, and a perf
profile like this:

+ 38.50% 38.40% 299520 postmaster postgres [.] compactify_tuples
+ 31.11% 31.02% 241975 postmaster libc-2.12.so [.] memcpy
+ 8.74% 8.72% 68022 postmaster postgres [.] itemoffcompare
+ 6.51% 6.49% 50625 postmaster postgres [.] compactify_tuples_loop
+ 4.21% 4.19% 32719 postmaster postgres [.] pg_qsort
+ 1.70% 1.69% 13213 postmaster postgres [.] memcpy(at)plt

There still doesn't seem to be any point in replacing the qsort,
but it does seem like something like the second attached patch
might be worth doing.

So I'm now wondering why my results seem to be so much different
from those of other people who have tried this, both as to whether
compactify_tuples is worth working on at all and as to what needs
to be done to it if so. Thoughts?

regards, tom lane

Attachment Content-Type Size
repeat-compactify.patch text/x-diff 1.8 KB
use-memcpy-not-memmove.patch text/x-diff 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-11-03 19:36:59 Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Tom Lane 2017-11-03 19:09:04 Re: Proposal: Local indexes for partitioned table