Re: POC: GROUP BY optimization

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: GROUP BY optimization
Date: 2018-06-07 13:31:31
Message-ID: 78d94983-0fcd-78ea-f4c0-efd5022e1386@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
>
> Can you give an example demonstrating the benefit?

>
> SELECT a, SUM(x) FROM
> (
> SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b
> UNION ALL
> SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b
> ) foo GROUP BY a;
>
> and indexes on (a,b) and (b,a) for both relations. The "deduplication"
> by pathkeys I suggested would mean we might keep only index (a,b) on t1
> and (b,a) on t2, which means the grouping by "a" can't leverage index
> scans on both relations. But if we keep paths for both indexes on each
> relation, we can.
yes, one of option

> Isn't "estimation of cost of comparing function/number of unique values
> in column could be not very accurate and so planner could make a wrong
> choice" is more an argument against relying on it when doing these
> optimizations?
>
> FWIW it's one of the arguments Tom made in the incremental sort patch,
> which relies on it too when computing cost of the incremental sort. I'm
> sure it's going to be an obstacle there too. >
>> I saw 2 times difference in real-world application. Again, improving
>> sort cost estimation is a separate task.
> Sure. But we also need to ask the other question, i.e. how many people
> would be negatively affected by the optimization. And I admit I don't
> know the answer to that, the next example is entirely made up.
Hm, seems, the best way here is a improving cost_sort estimation. Will try, but
I think that is separated patch

> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.

Will do

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-06-07 13:35:12 Re: Remove mention in docs that foreign keys on partitioned tables are not supported
Previous Message Ashutosh Bapat 2018-06-07 13:01:32 Re: commitfest 2018-07