POC: GROUP BY optimization

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: POC: GROUP BY optimization
Date: 2018-05-29 23:07:50
Message-ID: 7c79e6a5-8597-74e8-0671-1c39d124c9d6@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

As far as I known, columns in GROUP BY could be reordered without loss of
correctness. But reorder could give a benefits in performance. Suggested patch
implements that in several ways:

1) Reorder GROUP BY columns by number of unique values in descent order. Simple
example shows a performance difference by two times (see
opt_group_by_demostrator.sql, on my notebook first two queries demonstrate
that). The idea is: if first column is unique then sort comparator will never
compute difference of following columns.

2) Reorder GROUP BY columns to match existing pathkeys which are result of index
scan or ORDER BY clause. It prevents to add sort node - suppose, it's a clear win.

3) Patch makes suggestion to use index by GROUP BY clause, but unlike to ORDER
BY or merge join case it doesn't pay attention to actual order of columns
because of 2)

Patch doesn't change any cost estimation computation, although 1) could take an
advantage of it.

Some issues/problems/notices:

1) I didn't play around GROUPING SETS at all. As I can see, current coding
prevents any optimization around it and, suppose, it should be addressed to
another patch

2) I'm not completely satisfied with counting of number of groups per column, it
looks ugly without refactoring estimate_num_groups(). At least, now it could be
called multiple times for each column. May be, this number should be added to
PathKey structure?

3) EquivalenceClass->ec_sortref. If planner faced with column first time in
WHERE clause it doesn't fill target reference field because it is unknown yet.
Patch looks for accordance of PathKey (group_pathkeys) and SortGroupClause
(groupClause) and fails in this case. So, get_eclass_for_sort_expr() now updates
ec_sortref field if it's not initialized yet.

4) Algorithms to reorder columns is proportional to N^2 where N is number of
columns, but I hope it isn't a problem because number of GROUP BY columns isn't
very big usually.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
opt_group_by-v5.patch text/x-patch 24.6 KB
opt_group_by_demostrator.sql application/sql 672 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-05-29 23:14:51 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Previous Message Michael Paquier 2018-05-29 21:15:59 pg_config.h.win32 missing a set of flags from pg_config.h.in added in v11 development