Re: Generating code for query jumbling through gen_node_support.pl

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Subject: Re: Generating code for query jumbling through gen_node_support.pl
Date: 2023-01-27 02:59:47
Message-ID: Y9M+C2obzgRnXPNd@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 26, 2023 at 09:37:13AM +0100, Peter Eisentraut wrote:
> Ok, the documentation make sense now. I wonder what the performance impact
> is. Probably, nobody cares about microoptimizing CREATE TABLE statements.
> But BEGIN/COMMIT could matter. However, whatever you do in between the
> BEGIN and COMMIT will already be jumbled, so you're already paying the
> overhead. Hopefully, jumbling such simple commands will have no noticeable
> overhead.
>
> In other words, we should test this and hopefully get rid of the 'string'
> method.

Yep. I have mentioned a few numbers upthread, and this deserves
discussion.

FYI, I have done more micro-benchmarking to compare both methods for
utility queries by hijacking JumbleQuery() to run the computation in a
tight loop run N times (could not come up with a better idea to avoid
the repeated palloc/pfree overhead), as the path to stress is
_jumbleNode(). See the attached, that should be able to apply on top
of the latest patch set (named as .txt to not feed it to the CF bot,
and need to recompile to switch the iteration).

Using that, I can compile the following results for various cases (-O2
and compute_query_id=on):
query | mode | iterations | avg_runtime_ns | avg_jumble_ns
-------------------------+--------+------------+----------------+---------------
begin | string | 50000000 | 4.53116 | 4.54
begin | jumble | 50000000 | 30.94578 | 30.94
commit | string | 50000000 | 4.76004 | 4.74
commit | jumble | 50000000 | 31.4791 | 31.48
create table 1 column | string | 50000000 | 7.22836 | 7.08
create table 1 column | jumble | 50000000 | 152.10852 | 151.96
create table 5 columns | string | 50000000 | 12.43412 | 12.28
create table 5 columns | jumble | 50000000 | 352.88976 | 349.1
create table 20 columns | string | 5000000 | 49.591 | 48.2
create table 20 columns | jumble | 5000000 | 2272.4066 | 2271
drop table 1 column | string | 50000000 | 6.70538 | 6.56
drop table 1 column | jumble | 50000000 | 50.38 | 50.24
drop table 5 columns | string | 50000000 | 6.88256 | 6.74
drop table 5 columns | jumble | 50000000 | 50.02898 | 49.9
SET work_mem | string | 50000000 | 7.28752 | 7.28
SET work_mem | jumble | 50000000 | 91.66588 | 91.64
(16 rows)

avg_runtime_ns is (query runtime / iterations) and avg_jumble_ns is
the same with the difference between the start/end logs in the txt
patch attached. The overhead to run the query does not matter much if
you compare both. The time it takes to run a jumble is correlated to
the number of nodes to go through for each query, and there is a
larger gap for more nodes to go through. Well, a simple "begin" or
"commit" query has its computation time increase from 4ns to 30ns in
average which would be unnoticeable. The gap is larger for larger
nodes, like SET, still we jump from 7ns to 90ns in this case. DDLs
take the most hit with this method, where a 20-column CREATE TABLE
jumps from 50ns to 2us (note that the iteration is 10 times lower
here).

At the end, that would be unnoticeable for the average user, I guess,
but here are the numbers I get on my laptop :)
--
Michael

Attachment Content-Type Size
v8-Add-benchmark-tweaks-to-stress-jumbling-code.txt text/plain 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-27 03:01:03 Re: New strategies for freezing, advancing relfrozenxid early
Previous Message Andres Freund 2023-01-27 02:37:43 Re: New strategies for freezing, advancing relfrozenxid early