Group by reordering optimization

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Group by reordering optimization
Date: 2020-09-01 11:15:31
Message-ID: CA+q6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Better late than never, to follow up on the original thread [1] I would like to
continue the discussion with the another version of the patch for group by
reordering optimization. To remind, it's about reordering of group by clauses
to do sorting more efficiently. The patch is rebased and modified to address
(at least partially) the suggestions about making it consider new additional
paths instead of changing original ones. It is still pretty much
proof-of-concept version though with many blind spots, but I wanted to start
kicking it and post at least something, otherwise it will never happen. An
incremental approach so to say.

In many ways it still contains the original code from Teodor. Changes and notes:

* Instead of changing the order directly, now patch creates another patch with
modifier order of clauses. It does so for the normal sort as well as for
incremental sort. The whole thing is done in two steps: first it finds a
potentially better ordering taking into account number of groups, widths and
comparison costs; afterwards this information is used to produce a cost
estimation. This is implemented via a separate create_reordered_sort_path to
not introduce too many changes, I couldn't find any better place.

* Function get_func_cost was removed at some point, but unfortunately this
patch was implemented before that, so it's still present there.

* For simplicity I've removed support in create_partial_grouping_paths, since
they were not covered by the existing tests anyway.

* The costing part is pretty rudimentary and looks only at the first group.
It's mostly hand crafted to pass the existing tests.

The question about handling skewed data sets is not addressed yet.

[1]: https://www.postgresql.org/message-id/flat/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru

Attachment Content-Type Size
0001-Group-by-optimization.patch application/octet-stream 41.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lawrence Barwick 2020-09-01 12:20:41 Re: Docs: inaccurate description about config settings
Previous Message Amul Sul 2020-09-01 11:13:10 Re: [Patch] ALTER SYSTEM READ ONLY