Re: POC: GROUP BY optimization

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: GROUP BY optimization
Date: 2019-04-04 15:11:09
Message-ID: CA+q6zcVfsx_YuETPmjkNq-1=kJCS5JuShrL2ZNzm_HRVZ-tiYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> As nothing has happened since, I'm marking this as returned with
> feedback.

This patch was on my radar for some time in the past and we've seen use cases
where it could be pretty useful (probably even without the incremental sort
patch). I would like to make some progress here and see if it's possible to
continue it's development. I've attached the rebased version with a small
changes, e.g. I've created a separate patch with group by reordering tests to
make it easy to see what changes were introduced, and after some experiments
removed part that seems to duplicate "group by" reordering to follow "order
by". Also looks like it's possible to make these patches independent by having
a base patch with the isolated group_keys_reorder_by_pathkeys (they're
connected via n_preordered), but I haven't done this yet.

I went through the thread to summarize the objections, that were mentioned so
far. Most of them are related to the third patch in the series, where
reordering based on "ndistincs" is implemented, and are about cost_sort (all
the possible problems that could happen without proper cost estimation due to
non uniform distribution, different comparison costs and so on) and figuring
out how to limit number of possible combinations of pathkeys to compare. I
haven't looked at the proposed backtracking approach, but taking into account
that suggested patch for cost_sort [1] is RWF, I wonder what would be the best
strategy to proceed?

[1]: https://commitfest.postgresql.org/21/1706/

Attachment Content-Type Size
v11-0001-Add-tests-for-group-by-optimization.patch application/octet-stream 9.0 KB
v11-0002-Reorder-to-match-ORDER-BY-or-index.patch application/octet-stream 15.5 KB
v11-0003-Reorder-by-values-distribution.patch application/octet-stream 13.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2019-04-04 15:15:21 Re: GiST VACUUM
Previous Message Stephen Frost 2019-04-04 15:00:54 Re: [PATCH v20] GSSAPI encryption support