Improve hash-agg performance

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Improve hash-agg performance
Date: 2016-11-03 11:07:21
Message-ID: 20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

There's two things I found while working on faster expression
evaluation, slot deforming and batched execution. As those two issues
often seemed quite dominant cost-wise it seemed worthwhile to evaluate
them independently.

1) We atm do one ExecProject() to compute each aggregate's
arguments. Turns out it's noticeably faster to compute the argument
for all aggregates in one go. Both because it reduces the amount of
function call / moves more things into a relatively tight loop, and
because it allows to deform all the required columns at once, rather
than one-by-one. For a single aggregate it'd be faster to avoid
ExecProject alltogether (i.e. directly evaluate the expression as we
used to), but as soon you have two the new approach is faster.

2) For hash-aggs we right now we store the representative tuple using
the input tuple's format, with unneeded columns set to NULL. That
turns out to be expensive if the aggregated-on columns are not
leading columns, because we have to skip over a potentially large
number of NULLs. The fix here is to simply use a different tuple
format for the hashtable. That doesn't cause overhead, because we
already move columns in/out of the hashslot explicitly anyway.

Comments?

Regards,

Andres Freund

Attachment Content-Type Size
0001-Perform-one-only-projection-to-compute-agg-arguments.patch text/x-patch 9.6 KB
0002-User-narrower-representative-tuples-in-the-hash-agg-.patch text/x-patch 11.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-11-03 11:26:41 Re: Danger of automatic connection reset in psql
Previous Message Kevin Grittner 2016-11-03 10:39:53 Re: delta relations in AFTER triggers