Re: plan time of MASSIVE partitioning ...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: plan time of MASSIVE partitioning ...
Date: 2010-10-29 16:53:30
Message-ID: 3644.1288371210@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Boszormenyi Zoltan <zb(at)cybertec(dot)at> writes:
> On the other hand, if I use a similar test case to my original one
> (i.e. the tables are much wider) then the query planning takes
> 1.42 seconds in 9.1 with this patch instead of about 4.7 seconds
> as we observed it using PostgreSQL 9.0.0. The beginning of the gprof
> output now looks like this:

> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 21.13 0.30 0.30 235091 0.00 0.00 SearchCatCache
> 7.04 0.40 0.10 1507206 0.00 0.00 hash_search_with_hash_value
> 3.52 0.45 0.05 2308219 0.00 0.00 AllocSetAlloc

Yeah, for me it looks even worse: oprofile shows about 77% of time in
SearchCatCache. I poked around a little and it seems that probably most
of the time is going into searches of the STATRELATTINH syscache, which
looks like this:

$13 = {id = 41, cc_next = 0x2b43a60,
cc_relname = 0x7f6bc6ed2218 "pg_statistic", cc_reloid = 2619,
cc_indexoid = 2696, cc_relisshared = 0 '\000', cc_tupdesc = 0x7f6bc6ed11d8,
cc_ntup = 68922, cc_nbuckets = 1024, cc_nkeys = 3, cc_key = {1, 2, 3, 0},
...

Most of those entries are "negative" cache entries, since we don't have
any actual stats in this toy example.

I think that we probably should be very circumspect about believing that
this example is still a good guide to what to optimize next; in
particular, in a real-world example with real stats, I'm not sure that
the hot spots will still be in the same places. I'd advise loading up
some real data and doing more profiling.

However, if the hot spot does stay in SearchCatCache, I can't help
noticing that those bucket chains are looking a bit overloaded ---
sixty-plus entries per bucket ain't good. Maybe it's time to teach
catcache.c how to reorganize its hashtables once the load factor
exceeds a certain level. Or more drastically, maybe it should lose
its private hashtable logic and use dynahash.c; I'm not sure at the
moment if the private implementation has any important characteristics
dynahash hasn't got.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-10-29 17:15:55 Re: plan time of MASSIVE partitioning ...
Previous Message Marti Raudsepp 2010-10-29 16:25:03 [PATCH] Cleanup: Compare pointers to NULL instead of 0