Re: [HACKERS] Profile of current backend

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: matti(at)algonet(dot)se (Mattias Kregert)
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Profile of current backend
Date: 1998-04-24 15:03:25
Message-ID: 199804241503.LAA16668@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Bruce Momjian wrote:
> >
> > Interesting. Nothing is jumping out at me. Looks like we could try to
> > clean up heapgettup() to see if there is anything in there that can be
> > speeded up.
> >
> > None of the calls looks like it should be inlined. Do you see any that
> > look good for inlining?
>
> ExecScan() seems to be the only func which calls SeqNext(), which in
> turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> the calls to heapgettup().
>
> - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
>
> - In heapgettup(), 50% is the func itself and 50% is called funcs.
> Top four CPU consumers:
> 0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
> 0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
> 0.10 0.00 26276/42896 nocachegetattr [158]
> 0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]
>
> RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
>
> - Looking at RelationGetBufferWithBuffer():
> 0.00 0.10 4603/32354 ReadBuffer [55]
> ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
>
> -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
>
> -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
> -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
> -> Conclusion: INLINE BufferAlloc().
>
> - Looking at BufferAlloc():
> 0.04 0.25 37974/37974 BufTableLookup [114]
> 0.10 0.00 32340/151781 SpinAcquire [81]
> 0.10 0.00 37470/40585 PinBuffer [209]
> 0.08 0.00 38478/43799 RelationGetLRelId [234]
> 0.04 0.00 37974/151781 SpinRelease [175]
>
> -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
> -> BufferAlloc() is the only func calling BufTableLookup().
> -> Conclusion: INLINE BufTableLookup().
>
> - Looking at BufTableLookup():
> 86% of CPU time is in calling hash_search(). The rest is own time.
>
> - Looking at hash_search():
> 0.13 0.41 179189/179189 call_hash [69]
> 0.00 0.00 6/6 bucket_alloc [1084]
> -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
>
> - Looking at call_hash():
> 0.37 0.00 171345/171345 tag_hash [94]
> 0.04 0.00 7844/7844 string_hash [348]
> -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
> -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
> -> Could we use a lookup table instead of doing hash calculations? Would not that
> -> be much faster?
>
>
> It looks to me as if there are too many levels of function calls.
> Perhaps all functions which are called by only one other func should be inlined?
>
>
> Guesstimate:
> This would speed up heapgettup() by 10% ???
> Other functions would speed up too.

I have reviewed your findings, and have inlined 9 small functions that
are called for every row. Unfortunately, several of the ones you
mention above are called via function pointers, so it was not possible
to inline them.

In my testing, I created a table with 160k rows, and did a SELECT that
would do a sequential scan of the table, and looked at the functions
that were called the most.

---------------------------------------------------------------------------

% cumulative self self total
time seconds seconds calls ms/call ms/call name
54.3 2.65 2.65 mcount (472)
16.2 3.44 0.79 2999 0.26 0.26 _read [18]
3.5 3.61 0.17 160148 0.00 0.01 _heapgettup [15]
2.5 3.73 0.12 174007 0.00 0.00 _SpinAcquire [36]
2.3 3.84 0.11 981732 0.00 0.00 _BufferIsValid [37]
2.3 3.95 0.11 160148 0.00 0.01 _heap_getnext [14]
2.0 4.05 0.10 160096 0.00 0.00 _ExecEvalVar [38]
2.0 4.15 0.10 160096 0.00 0.00 _ExecMakeFunctionResult <cycle 3> [26]
1.4 4.22 0.07 174007 0.00 0.00 _SpinRelease [41]
1.2 4.28 0.06 18176 0.00 0.00 _hash_search [40]
1.0 4.33 0.05 480288 0.00 0.00 _ExecEvalExpr <cycle 3> [34]
1.0 4.38 0.05 167313 0.00 0.00 _RelationGetLRelId [47]
1.0 4.43 0.05 160096 0.00 0.00 _ExecEvalFuncArgs <cycle 3> [48]
0.8 4.47 0.04 160097 0.00 0.01 _SeqNext [13]
0.8 4.51 0.04 160096 0.00 0.00 _ExecQual [22]
0.6 4.54 0.03 160096 0.00 0.00 _ExecEvalOper <cycle 3> [51]
0.4 4.56 0.02 163015 0.00 0.00 _IncrBufferRefCount [61]
0.4 4.58 0.02 161342 0.00 0.00 _fmgr_c <cycle 5> [29]
0.4 4.60 0.02 160451 0.00 0.00 _TransactionIdEquals [62]
0.4 4.62 0.02 160148 0.00 0.00 _RelationGetBufferWithBuffer [28]
0.4 4.64 0.02 160098 0.00 0.00 _TupIsNull [63]
0.4 4.66 0.02 160097 0.00 0.00 _ExecIncrSlotBufferRefcnt [45]
0.4 4.68 0.02 160096 0.00 0.00 _ExecQualClause [23]
0.4 4.70 0.02 6161 0.00 0.00 _strcpy [64]
0.4 4.72 0.02 31 0.65 0.65 _open [65]
0.2 4.73 0.01 164288 0.00 0.00 _ReleaseBuffer [81]
0.2 4.74 0.01 164048 0.00 0.00 _PageGetMaxOffsetNumber [97]
0.2 4.75 0.01 160450 0.00 0.00 _HeapTupleSatisfiesVisibility [50]
0.2 4.76 0.01 160449 0.00 0.00 _HeapTupleSatisfiesNow [98]
0.2 4.77 0.01 160100 0.00 0.00 _ExecClearTuple [49]
0.2 4.78 0.01 160098 0.00 0.00 _ExecStoreTuple [46]
0.2 4.79 0.01 160096 0.00 0.00 _bpchareq [102]
0.2 4.80 0.01 15696 0.00 0.00 _call_hash [99]
0.2 4.81 0.01 12187 0.00 0.00 _malloc [96]
0.2 4.82 0.01 9182 0.00 0.00 _PortalHeapMemoryAlloc [73]
0.2 4.83 0.01 7257 0.00 0.00 _newNode [54]
0.2 4.84 0.01 2973 0.00 0.20 _ReleaseAndReadBuffer [25]
0.2 4.85 0.01 1241 0.01 0.02 _LockAcquire [58]
0.2 4.86 0.01 189 0.05 0.05 _IsBootstrapProcessingMode [103]
0.2 4.87 0.01 3 3.33 3.33 _shmdt [101]
0.2 4.88 0.01 1 10.00 2172.84 _ExecScan [11]
0.0 4.88 0.00 166214 0.00 0.00 _BufferGetBlock [72]
0.0 4.88 0.00 160449 0.00 0.00 _heapisoverride [477]
0.0 4.88 0.00 15303 0.00 0.00 _tag_hash [478]
0.0 4.88 0.00 11718 0.00 0.00 _AllocSetContains [479]
0.0 4.88 0.00 11718 0.00 0.00 _OrderedSetContains [480]
0.0 4.88 0.00 11366 0.00 0.00 _OrderedElemPush [481]
0.0 4.88 0.00 11366 0.00 0.00 _OrderedElemPushHead [482]
0.0 4.88 0.00 11366 0.00 0.00 _OrderedElemPushInto [483]
0.0 4.88 0.00 11361 0.00 0.00 _AllocSetAlloc [104]
0.0 4.88 0.00 11361 0.00 0.00 _MemoryContextAlloc [67]
0.0 4.88 0.00 11339 0.00 0.00 _palloc [68]
0.0 4.88 0.00 10515 0.00 0.00 _free [484]
0.0 4.88 0.00 10488 0.00 0.00 _AllocSetFree [485]
0.0 4.88 0.00 10488 0.00 0.00 _OrderedElemPop [486]
0.0 4.88 0.00 8176 0.00 0.00 _get_expr [487]
0.0 4.88 0.00 7953 0.00 0.00 _AllocSetGetFirst [488]
0.0 4.88 0.00 7953 0.00 0.00 _OrderedSetGetHead [489]
0.0 4.88 0.00 7952 0.00 0.00 _OrderedElemGetBase [490]
0.0 4.88 0.00 7424 0.00 0.00 _memmove [491]

---------------------------------------------------------------------------

Of these, I inlined:

#define TransactionIdEquals(id1, id2) \
#define TupIsNull(slot) \
#define ExecIncrSlotBufferRefcnt(slot) \
#define BufferIsValid(bufnum) \
#define IncrBufferRefCount(buffer) \
#define BufferGetBlock(buffer) \
#define PageGetMaxOffsetNumber(page) \
#define HeapTupleSatisfiesVisibility(tuple, seeself) \
#define heapisoverride() \

When I say linined, I mean I made macros of them. I did not inline the
code into other functions, because you loose the modularity of the
function name, and many are called multiple times.

These were all small functions, only a few lines of actual code.
BufferIsValid was called several times per row.

With that said, the execution times did not change at all. What I did
see is that the mcount row in gprof output:

54.3 2.65 2.65 mcount (472)

decreased from 2.65 to 1.79, which means that less time was spent in
function calls. Not sure why that time did not translate to better
overall performance, but the inlining is a clear win for these small
functions. mcount is artificially high anyway because it is doing the
profiling. The resulting gprof output shows a much smaller list of
functions called 160k times.

A patch will be posted to the patches list, for any of you who subscribe
to that.

--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jose' Soares Da Silva 1998-04-24 15:04:55 Access'97 and ODBC
Previous Message Tom Lane 1998-04-24 14:50:51 Re: [HACKERS] Re: [INTERFACES] retrieving varchar size