Re: Performance gain from reduction of GROUP BY memory

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance gain from reduction of GROUP BY memory
Date: 2005-08-30 07:37:04
Message-ID: 1125387424.4010.398.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-announce pgsql-hackers

On Mon, 2005-08-29 at 20:25 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I notice that Neil's patch regarding reducing the number of memory
> > allocations during aggregation operations isn't mentioned. It was
> > originally discussed in 8.0beta (2-3?) time.
>
> > What happened there?
> > - patch not committed in the end
> > - committed but not mentioned, as a dropped item
> > - committed but not mentioned, since part of a larger patch
>
> Are you speaking of these patches?

Yes, those look like the ones I mentioned.

Those seem to have a useful performance improvement?

At very least, the change in Aggregate function API should be mentioned,
no?

> 2005-04-06 19:56 neilc
>
> * src/backend/utils/adt/: float.c, numeric.c: Apply the "nodeAgg"
> optimization to more of the builtin transition functions. This
> patch optimizes int2_sum(), int4_sum(), float4_accum() and
> float8_accum() to avoid needing to copy the transition function's
> state for each input tuple of the aggregate. In an extreme case
> (e.g. SELECT sum(int2_col) FROM table where table has a single
> column), it improves performance by about 20%. For more complex
> queries or tables with wider rows, the relative performance
> improvement will not be as significant.
>
> 2005-04-04 19:50 neilc
>
> * src/backend/utils/adt/numeric.c: This patch changes
> int2_avg_accum() and int4_avg_accum() use the nodeAgg performance
> hack Tom introduced recently. This means we can avoid copying the
> transition array for each input tuple if these functions are
> invoked as aggregate transition functions.
>
> To test the performance improvement, I created a 1 million row
> table with a single int4 column. Without the patch, SELECT avg(col)
> FROM table took about 4.2 seconds (after the data was cached); with
> the patch, it took about 3.2 seconds. Naturally, the performance
> improvement for a less trivial query (or a table with wider rows)
> would be relatively smaller.
>
> 2005-03-12 15:25 tgl
>
> * contrib/intagg/int_aggregate.c,
> contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml,
> doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c,
> src/backend/utils/adt/int8.c: Adjust the API for aggregate function
> calls so that a C-coded function can tell whether it is being used
> as an aggregate or not. This allows such a function to avoid
> re-pallocing a pass-by-reference transition value; normally it
> would be unsafe for a function to scribble on an input, but in the
> aggregate case it's safe to reuse the old transition value. Make
> int8inc() do this. This gets a useful improvement in the speed of
> COUNT(*), at least on narrow tables (it seems to be swamped by I/O
> when the table rows are wide). Per a discussion in early December
> with Neil Conway. I also fixed int_aggregate.c to check this,
> thereby turning it into something approaching a supportable
> technique instead of being a crude hack.

I'll search CVS directly next time. Thanks.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-announce by date

  From Date Subject
Next Message EMS Software Development 2005-08-30 08:22:54 EMS PostgreSQL Manager 3.2 released
Previous Message Tom Lane 2005-08-30 00:25:51 Re: Performance gain from reduction of GROUP BY memory allocations

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2005-08-30 09:14:01 Re: Query Sampling
Previous Message Luke Lonergan 2005-08-30 06:35:26 Re: SHMMAX seems entirely broken in OS X 10.4.2