Re: testing ProcArrayLock patches

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: testing ProcArrayLock patches
Date: 2011-11-18 22:12:02
Message-ID: 201111182312.02527.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote:
> Andres Freund <andres(at)anarazel(dot)de> wrote:
> > When doing line-level profiles I would suggest looking at the
> > instructions.
> What's the best way to do that?
I think opannotate -a -s produces output with instructions/code intermingled.

> > I don't think cache line contention is the most likely candidate
> > here. Simple cache-misses seem far more likely. In combination
> > with pipeline stalls...
> >
> > Newer cpus (nehalem+) can measure stalled cycles which can be
> > really useful when analyzing performance. I don't remember how to
> > do that with oprofile right now though as I use perf these days
> > (its -e stalled-cycles{frontend|backend} there}).
>
> When I run oprofile, I still always go back to this post by Tom:
> http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php
Hrm. I am on the train and for unknown reasons the only sensible working
protocols are smtp + pop.... Waiting.... Waiting....
Sorry, too slow/high latency atm. I wrote everything below and another mail
and the page still hasn't loaded.

oprofile can produces graphes as well (--callgraph). for both tools you need
-fno-omit-frame-pointers to get usable graphs.

> Can anyone provide such a "cheat sheet" for perf? I could give that
> a try if I knew how.
Unfortunately for sensible results the kernel needs to be rather new.
I would say > 2.6.28 or so (just guessed).

# to record activity
perf record [-g|--call-graph] program|-p pid

# to view a summation
perf report

graph:
# Overhead Command Shared Object Symbol
# ........ ........ ................. .........................................
#
4.09% postgres postgres [.] slab_alloc_dyn
|
--- slab_alloc_dyn
|
|--18.52%-- new_list
| |
| |--63.79%-- lappend
| | |
| | |--13.40%-- find_usable_indexes
| | | create_index_paths
| | | set_rel_pathlist
| | | make_one_rel

flat:

# Overhead Command Shared Object Symbol
# ........ ........ ................. .........................................
#
5.10% postgres [vdso] [.] 0x7ffff3d8d770
4.26% postgres postgres [.] base_yyparse
3.88% postgres postgres [.] slab_alloc_dyn
2.82% postgres postgres [.] core_yylex
2.37% postgres postgres [.] SearchCatCache
1.85% postgres libc-2.13.so [.] __memcpy_ssse3
1.66% postgres libc-2.13.so [.] __GI___strcmp_ssse3
1.23% postgres postgres [.] MemoryContextAlloc

# to view a line/source/instruction level view
perf annotate -l symbol

...
:
: /*
: * one-time startup overhead for each cache
: */
: if (cache->cc_tupdesc == NULL)
0.35 : 6e81fd: 48 83 7f 28 00 cmpq $0x0,0x28(%rdi)
/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1070
4.15 : 6e8202: 0f 84 54 04 00 00 je 6e865c <SearchCatCache+0x47c>
: #endif
:
: /*
: * initialize the search key information
: */
: memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey));
0.00 : 6e8208: 48 8d bd a0 fe ff ff lea -0x160(%rbp),%rdi
0.17 : 6e820f: 49 8d 77 70 lea 0x70(%r15),%rsi
0.00 : 6e8213: b9 24 00 00 00 mov $0x24,%ecx
/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1080
33.22 : 6e8218: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
: cur_skey[0].sk_argument = v1;
/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1081
1.56 : 6e821b: 48 89 9d e0 fe ff ff mov %rbx,-0x120(%rbp)
...

# get heaps of stats from something
perf stat -ddd someprogram|-p pid

1242.409965 task-clock # 0.824 CPUs utilized [100.00%]
14,572 context-switches # 0.012 M/sec [100.00%]
264 CPU-migrations # 0.000 M/sec [100.00%]
0 page-faults # 0.000 M/sec
2,854,775,135 cycles # 2.298 GHz [26.28%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
2,024,997,785 instructions # 0.71 insns per cycle [25.25%]
387,240,903 branches # 311.685 M/sec [26.51%]
21,756,886 branch-misses # 5.62% of all branches [26.26%]
753,912,137 L1-dcache-loads # 606.814 M/sec [13.24%]
52,733,007 L1-dcache-load-misses # 6.99% of all L1-dcache hits [14.72%]
35,006,406 LLC-loads # 28.176 M/sec [15.46%]
26,673 LLC-load-misses # 0.08% of all LL-cache hits [13.38%]
1,855,654,347 L1-icache-loads # 1493.593 M/sec [12.63%]
52,169,033 L1-icache-load-misses # 2.81% of all L1-icache hits [12.88%]
761,475,250 dTLB-loads # 612.902 M/sec [13.37%]
4,457,558 dTLB-load-misses # 0.59% of all dTLB cache hits [13.12%]
2,049,753,137 iTLB-loads # 1649.820 M/sec [20.09%]
4,139,394 iTLB-load-misses # 0.20% of all iTLB cache hits [19.31%]
3,705,429 L1-dcache-prefetches # 2.982 M/sec [19.64%]
<not supported> L1-dcache-prefetch-misses

1.507855345 seconds time elapsed

-r can repeat a command and gives you the standard derivation...

# show whats the system executing overall
perf top -az

# get help
perf help (record|report|annotate|stat|...)

In new versions many commands (those that produce pageable text) take --stdio
and --tui to select between two interfaces. I personally fnd --tui unusable.

I am not really sure how good the results are compared to oprofile I
just prefer the ui by far... Also the overhead seems to be measurably
smaller. Also its usable by every user, not just root...

Hope that suffices? I have no problem answering further questions, so ...

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-11-18 22:26:43 Should a materialized view be based on a view?
Previous Message Martijn van Oosterhout 2011-11-18 22:08:18 Re: Inlining comparators as a performance optimisation