Re: WITHIN GROUP patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: WITHIN GROUP patch
Date: 2014-01-04 23:00:55
Message-ID: 26675.1388876455@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> I've committed this after significant editorialization --- most
> Tom> notably, I pushed control of the sort step into the aggregate
> Tom> support functions.

> Initial tests suggest that your version is ~40% slower than ours on
> some workloads.

I poked at this a bit with perf and oprofile, and concluded that probably
the difference comes from ordered_set_startup() repeating lookups for each
group that could be done just once per query. I'm not sure I believe the
40% figure; on this particular test case, oprofile breaks down the costs
of ordered_set_startup() like this:

29 0.0756 postgres advance_transition_function
38307 99.9244 postgres ordered_set_transition
1445 1.0808 postgres ordered_set_startup
31418 79.4990 postgres tuplesort_begin_datum
4056 10.2632 postgres get_typlenbyvalalign
1445 3.6564 postgres ordered_set_startup [self]
734 1.8573 postgres MemoryContextAllocZero
510 1.2905 postgres RegisterExprContextCallback
283 0.7161 postgres exprType
247 0.6250 postgres get_sortgroupclause_tle
235 0.5946 postgres exprCollation
92 0.2328 postgres ReleaseSysCache
78 0.1974 postgres SearchSysCache
71 0.1797 postgres AggGetAggref
63 0.1594 postgres AggCheckCallContext
58 0.1468 postgres AllocSetAlloc
46 0.1164 postgres PrepareSortSupportFromOrderingOp
43 0.1088 postgres AggGetPerAggEContext
40 0.1012 postgres get_typlenbyval
39 0.0987 postgres palloc0
35 0.0886 postgres MemoryContextAlloc
17 0.0430 postgres get_sortgroupref_tle
10 0.0253 postgres tuplesort_begin_common

The tuplesort_begin_datum calls are needed regardless --- your version
was just doing them inside nodeAgg rather than externally. However,
get_typlenbyvalalign and some of the other calls here are to fetch
information that doesn't change across groups; probably we could arrange
to cache that info instead of recomputing it each time. Still, it doesn't
look like that could save more than 20% of ordered_set_startup, which
itself is still only a few percent of the total query time.

Looking at this example makes me wonder if it wouldn't be worthwhile to
provide a way to reset and re-use a tuplesort object, instead of redoing
all the lookup work involved. Or maybe just find a way to cache the
catalog lookups that are happening inside tuplesort_begin_datum, which are
about 50% of that function's cost it looks like. We're paying this same
kind of price for repeated tuplesort setup in the existing nodeAgg code,
if we have an aggregate with ORDER BY or DISTINCT in a grouped query with
many groups.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-01-05 06:04:14 Re: ERROR: missing chunk number 0 for toast value
Previous Message knizhnik 2014-01-04 20:27:13 Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL