Re: GiST index performance

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: GiST index performance
Date: 2009-06-10 11:46:28
Message-ID: alpine.DEB.2.00.0906101232020.4337@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 5 Jun 2009, Robert Haas wrote:
> On Thu, Jun 4, 2009 at 12:33 PM, Matthew Wakeling<matthew(at)flymine(dot)org> wrote:
>> Do you have a recommendation for how to go about profiling Postgres, what
>> profiler to use, etc? I'm running on Debian Linux x86_64.
>
> I mostly compile with --enable-profiling and use gprof. I know Tom
> Lane has had success with oprofile for quick and dirty measurements
> but I haven't quite figured out how to make all the bits work for that
> yet.

Okay, I recompiled Postgres 8.4 beta 2 with profiling, installed
btree_gist and bioseg, and did a large multi-column (btree_gist, bioseg)
index search.

EXPLAIN ANALYSE SELECT *
FROM location l1, location l2
WHERE l1.objectid = l2.objectid
AND bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start, l2.intermine_end);

QUERY PLAN
-----------------------------------------------------------------------
Nested Loop (cost=0.01..9292374.77 rows=19468831 width=130)
(actual time=0.337..24538315.569 rows=819811624 loops=1)
-> Seq Scan on location l1
(cost=0.00..90092.17 rows=4030117 width=65)
(actual time=0.033..2561.142 rows=4030122 loops=1)
-> Index Scan using location_object_bioseg on location l2
(cost=0.01..1.58 rows=35 width=65)
(actual time=0.467..5.990 rows=203 loops=4030122)
Index Cond: ((l2.objectid = l1.objectid) AND (bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start, l2.intermine_end)))
Total runtime: 24613918.894 ms
(5 rows)

Here is the top of the profiling result:

Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
35.41 2087.04 2087.04 823841746 0.00 0.00 gistnext
15.36 2992.60 905.56 8560743382 0.00 0.00 fmgr_oldstyle
8.65 3502.37 509.77 3641723296 0.00 0.00 FunctionCall5
7.08 3919.87 417.50 3641723296 0.00 0.00 gistdentryinit
5.03 4216.59 296.72 6668 0.00 0.00 DirectFunctionCall1
3.84 4443.05 226.46 3641724371 0.00 0.00 FunctionCall1
2.32 4579.94 136.89 1362367466 0.00 0.00 hash_search_with_hash_value
1.89 4691.15 111.21 827892583 0.00 0.00 FunctionCall2
1.83 4799.27 108.12 FunctionCall6
1.77 4903.56 104.30 2799321398 0.00 0.00 LWLockAcquire
1.45 4989.24 85.68 1043922430 0.00 0.00 PinBuffer
1.37 5070.15 80.91 823844102 0.00 0.00 index_getnext
1.33 5148.29 78.15 1647683886 0.00 0.00 slot_deform_tuple
0.95 5204.36 56.07 738604164 0.00 0.00 heap_page_prune_opt

The final cumulative time is 5894.06 seconds, which doesn't seem to match
the query run time at all. Also, no calls to anything including "bioseg"
in the name are recorded, although they are definitely called as the GiST
support functions for that data type.

Could someone give me a hand decyphering this result? It seems from this
that the time is spent in the gistnext function (in
src/backend/access/gist/gistget.c) and not its children. However, it's
quite a large function containing several loops - is there a way to make
the profiling result more specific?

Matthew

--
If you let your happiness depend upon how somebody else feels about you,
now you have to control how somebody else feels about you. -- Abraham Hicks

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2009-06-10 13:12:07 Censorship
Previous Message Markus Wanner 2009-06-10 11:23:10 Re: Hosted servers with good DB disk performance?