From: | Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2023-12-28 08:22:47 |
Message-ID: | 75743fb8-42a1-4017-a108-adea6f027f96@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 27/12/2023 12:07, Tom Lane wrote:
> Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
>> To be clear. In [1], I mentioned we can perform micro-benchmarks and
>> structure costs of operators. At least for fixed-length operators, it is
>> relatively easy.
>
> I repeat what I said: this is a fool's errand. You will not get
> trustworthy results even for the cases you measured, let alone
> all the rest. I'd go as far as to say I would not believe your
> microbenchmarks, because they would only apply for one platform,
> compiler, backend build, phase of the moon, etc.
Thanks for the explanation.
I removed all cost-related codes. It still needs to be finished; I will
smooth the code further and rewrite regression tests - many of them
without cost-dependent reorderings look silly. Also, remember
Alexander's remarks, which must be implemented, too.
But already here, it works well. Look:
Preliminaries:
CREATE TABLE t(x int, y int, z text, w int);
INSERT INTO t SELECT gs%100,gs%100, 'abc' || gs%10, gs
FROM generate_series(1,10000) AS gs;
CREATE INDEX abc ON t(x,y);
ANALYZE t;
SET enable_hashagg = 'off';
This patch eliminates unneeded Sort operation:
explain SELECT x,y FROM t GROUP BY (x,y);
explain SELECT x,y FROM t GROUP BY (y,x);
Engages incremental sort:
explain SELECT x,y FROM t GROUP BY (x,y,z,w);
explain SELECT x,y FROM t GROUP BY (z,y,w,x);
explain SELECT x,y FROM t GROUP BY (w,z,x,y);
explain SELECT x,y FROM t GROUP BY (w,x,z,y);
Works with subqueries:
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z) AS q1
GROUP BY (w,x,z,y);
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z LIMIT 100) AS q1
GROUP BY (w,x,z,y);
But arrangement with an ORDER BY clause doesn't work:
DROP INDEX abc;
explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);
I think the reason is that the sort_pathkeys and group_pathkeys are
physically different structures, and we can't just compare pointers here.
--
regards,
Andrei Lepikhov
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
0001-Explore-alternative-orderings-of-group-by-pathkeys-d-20231228.patch | text/plain | 47.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2023-12-28 09:28:25 | Re: Track in pg_replication_slots the reason why slots conflict? |
Previous Message | Anita | 2023-12-28 08:12:06 | Pdadmin open on Macbook issue |