Re: Use virtual tuple slot for Unique node

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Денис Смирнов <darthunix(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use virtual tuple slot for Unique node
Date: 2023-09-22 12:36:44
Message-ID: 498cf1d0-ee06-16a7-6322-e2ff77b79bdd@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I did a little more perf testing with this. I'm seeing the same benefit
with the query you posted. But can we find a case where it's not
beneficial? If I understand correctly, when the input slot is a virtual
slot, it's cheaper to copy it to another virtual slot than to form a
minimal tuple. Like in your test case. What if the input is a minimial
tuple?

On master:

postgres=# set enable_hashagg=off;
SET
postgres=# explain analyze select distinct g::text, 'a', 'b', 'c','d',
'e','f','g','h' from generate_series(1, 5000000) g;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2630852.42..2655852.42 rows=200 width=288) (actual
time=4525.212..6576.992 rows=5000000 loops=1)
-> Sort (cost=2630852.42..2643352.42 rows=5000000 width=288)
(actual time=4525.211..5960.967 rows=5000000 loops=1)
Sort Key: ((g)::text)
Sort Method: external merge Disk: 165296kB
-> Function Scan on generate_series g (cost=0.00..75000.00
rows=5000000 width=288) (actual time=518.914..1194.702 rows=5000000 loops=1)
Planning Time: 0.036 ms
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 0.242 ms (Deform 0.035 ms), Inlining 63.457 ms,
Optimization 29.764 ms, Emission 20.592 ms, Total 114.056 ms
Execution Time: 6766.399 ms
(11 rows)

With the patch:

postgres=# set enable_hashagg=off;
SET
postgres=# explain analyze select distinct g::text, 'a', 'b', 'c','d',
'e','f','g','h' from generate_series(1, 5000000) g;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2630852.42..2655852.42 rows=200 width=288) (actual
time=4563.639..7362.467 rows=5000000 loops=1)
-> Sort (cost=2630852.42..2643352.42 rows=5000000 width=288)
(actual time=4563.637..6069.000 rows=5000000 loops=1)
Sort Key: ((g)::text)
Sort Method: external merge Disk: 165296kB
-> Function Scan on generate_series g (cost=0.00..75000.00
rows=5000000 width=288) (actual time=528.060..1191.483 rows=5000000 loops=1)
Planning Time: 0.720 ms
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 0.406 ms (Deform 0.065 ms), Inlining 68.385 ms,
Optimization 21.656 ms, Emission 21.033 ms, Total 111.480 ms
Execution Time: 7585.306 ms
(11 rows)

So not a win in this case. Could you peek at the outer slot type, and
use the same kind of slot for the Unique's result? Or some more
complicated logic, like use a virtual slot if all the values are
pass-by-val? I'd also like to keep this simple, though...

Would this kind of optimization make sense elsewhere?

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-09-22 12:58:20 Re: bug fix and documentation improvement about vacuumdb
Previous Message Robert Haas 2023-09-22 12:36:41 Re: Questions about the new subscription parameter: password_required