From: | Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2023-09-19 04:42:13 |
Message-ID: | e3602ccb-e643-2e79-ed2c-1175a80533a1@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20/7/2023 18:46, Tomas Vondra wrote:
> On 7/20/23 08:37, Andrey Lepikhov wrote:
>> On 3/10/2022 21:56, Tom Lane wrote:
>>> Revert "Optimize order of GROUP BY keys".
>>>
>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>> several follow-on fixes.
>>> ...
>>> Since we're hard up against the release deadline for v15, let's
>>> revert these changes for now. We can always try again later.
>>
>> It may be time to restart the project. As a first step, I rebased the
>> patch on the current master. It wasn't trivial because of some latest
>> optimizations (a29eab, 1349d27 and 8d83a5d).
>> Now, Let's repeat the review and rewrite the current path according to
>> the reasons uttered in the revert commit.
> 1) procost = 1.0 - I guess we could make this more realistic by doing
> some microbenchmarks and tuning the costs for the most expensive cases.
Ok, some thoughts on this part of the task. As I see, we have not so
many different operators: 26 with fixed width and 16 with variable width:
SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen>0
ORDER BY o.oid;
SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen<0
ORDER BY o.oid;
Benchmarking procedure of types with fixed length can be something like
below:
CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE (
nrows integer,
exec_time float
) AS $$
DECLARE
data json;
step integer;
BEGIN
SET work_mem='1GB';
FOR step IN 0..3 LOOP
SELECT pow(100, step)::integer INTO nrows;
DROP TABLE IF EXISTS test CASCADE;
EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x
FROM generate_series(1,%s) AS gs;', typ, nrows);
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON)
SELECT * FROM test ORDER BY (x) DESC INTO data;
SELECT data->0->'Execution Time' INTO exec_time;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;
Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow
of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for
the integer type (as a basis) we can calculate costs for other
operators. Variable-width types, i think, could require more complex
technique to check dependency on the length.
Does this way look worthwhile?
--
regards,
Andrey Lepikhov
Postgres Professional
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2023-09-19 04:50:55 | Re: Synchronizing slots from primary to standby |
Previous Message | Zhijie Hou (Fujitsu) | 2023-09-19 04:10:39 | Move global variables of pgoutput to plugin private scope. |