Quick Links

Re: Improve hash-agg performance

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Improve hash-agg performance
Date:	2016-11-04 13:18:49
Message-ID:	4735dcdf-6a25-c59d-c79c-977e95836005@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11/03/2016 01:07 PM, Andres Freund wrote:
> Hi,
>
> There's two things I found while working on faster expression
> evaluation, slot deforming and batched execution. As those two issues
> often seemed quite dominant cost-wise it seemed worthwhile to evaluate
> them independently.
>
> 1) We atm do one ExecProject() to compute each aggregate's
> arguments. Turns out it's noticeably faster to compute the argument
> for all aggregates in one go. Both because it reduces the amount of
> function call / moves more things into a relatively tight loop, and
> because it allows to deform all the required columns at once, rather
> than one-by-one. For a single aggregate it'd be faster to avoid
> ExecProject alltogether (i.e. directly evaluate the expression as we
> used to), but as soon you have two the new approach is faster.

Makes sense. If we do your refactoring of ExecEvalExpr into an
intermediate opcode representation, I assume the performance difference
will go away anyway.

> 2) For hash-aggs we right now we store the representative tuple using
> the input tuple's format, with unneeded columns set to NULL. That
> turns out to be expensive if the aggregated-on columns are not
> leading columns, because we have to skip over a potentially large
> number of NULLs. The fix here is to simply use a different tuple
> format for the hashtable. That doesn't cause overhead, because we
> already move columns in/out of the hashslot explicitly anyway.

Heh, I came to the same conclusion a couple of months ago when I was
profiling the aggregate code. I never got around to finish up and post
the patch I wrote back then, but here you go, for comparison. It's
pretty much the same as what you got here. So yeah, seems like a good idea.

- Heikki

Attachment	Content-Type	Size
0001-Don-t-store-unused-columns-in-hash-table.patch	text/x-diff	7.6 KB

In response to

Improve hash-agg performance at 2016-11-03 11:07:21 from Andres Freund

Responses

Re: Improve hash-agg performance at 2016-11-04 13:35:21 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2016-11-04 13:24:23	Re: Logical Replication WIP
Previous Message	Karl O. Pinc	2016-11-04 13:17:19	Re: Patch to implement pg_current_logfile() function