[PATCH] Lazy hashaggregate when no aggregation is needed

From: Ants Aasma <ants(at)cybertec(dot)at>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Francois Deliege <fdeliege(at)gmail(dot)com>
Subject: [PATCH] Lazy hashaggregate when no aggregation is needed
Date: 2012-03-28 02:37:25
Message-ID: CA+CSw_uE-RCyQd_bXJNe=usrXkq+keFrQrahkc+8ou+Ws4Y=Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A user complained on pgsql-performance that SELECT col FROM table
GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
to return tuples from hash-aggregate as they are found when no
aggregate functions are in use. Attached is a first shot at that. The
planner is modified so that when the optimization applies, hash table
size check is compared against the limit and start up cost comes from
the input. The executor is modified so that when the hash table is not
filled yet and the optimization applies, nodes are returned
immediately.

Can somebody poke holes in this? The patch definitely needs some code
cleanup in nodeAgg.c, but otherwise it passes regression tests and
seems to work as intended. It also optimizes the SELECT DISTINCT col
FROM table LIMIT 2; case, but not SELECT DISTINCT ON (col) col FROM
table LIMIT 2 because it is explicitly forced to use sorted
aggregation.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Attachment Content-Type Size
lazy-hashaggregate.patch text/x-patch 8.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2012-03-28 06:21:26 triggers and inheritance tree
Previous Message Fujii Masao 2012-03-28 02:10:46 Re: [COMMITTERS] pgsql: pg_test_timing utility, to measure clock monotonicity and timing