Re: Generating code for query jumbling through gen_node_support.pl

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Subject: Re: Generating code for query jumbling through gen_node_support.pl
Date: 2023-01-18 07:04:23
Message-ID: Y8eZ9/3STyhF57RV@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 17, 2023 at 04:52:28PM +0900, Michael Paquier wrote:
> On Tue, Jan 17, 2023 at 08:43:44AM +0100, Peter Eisentraut wrote:
>> Ok, I understand now, and I agree with this approach over the opposite. I
>> was confused because the snippet you showed above used "jumble_ignore", but
>> your patch is correct as it uses "jumble_location".
>
> Okay. I'll refresh the patch set so as we have only "jumble_ignore",
> then, like v1, with preparatory patches for what you mentioned and
> anything that comes into mind.

This is done as of the patch series v3 attached:
- 0001 reformats all the comments of the nodes.
- 0002 moves the current files for query jumble as of queryjumble.c ->
queryjumblefuncs.c and utils/queryjumble.h -> nodes/queryjumble.h.
- 0003 is the core feature, where I have done a second pass over the
nodes to make sure that things map with HEAD, incorporating the extra
docs coming from v2, adding a bit more.

>> That said, the term "jumble" is really weird, because in the sense that we
>> are using it here it means, approximately, "to mix together", "to unify".
>> So what we are doing with the Const nodes is really to *not* jumble the
>> location, but for all other node types we are jumbling the location. At
>> least that is my understanding.
>
> I am quite familiar with this term, FWIW. That's what we've inherited
> from the days where this has been introduced in pg_stat_statements.

I have renamed the node attributes to query_jumble_ignore and
no_query_jumble at the end, after considering Peter's point that only
"jumble" could be fuzzy here. The file names are changed in
consequence.

While doing all that, I have done some micro-benchmarking of
JumbleQuery(), making it loop 50M times on my laptop each time a query
ID is computed (hideous hack with a loop in queryjumble.c):
- For non-utility queries, aka queries that go through
JumbleQueryInternal(), I am measuring a repeatable ~10% improvement
with the generated code over HEAD, which is kind of nice. I have
tested a few DMLs and simple SELECTs, still it looks like a trend.
- For utility queries, the new code is competing against
hash_any_extended() with the query string, which is going to be hard
to beat.. I am measuring what looks like a 5 times slowdown, at
least, and more depending on the depth of the query tree. That's not
surprising. On a 50M loop, this comes down to compare a computation
of 100ns to 5ns for a 20-time slowdown for example, still this could
justify the addition of a GUC to control whether utility queries have
their query ID compiled depending on their nodes or their string
hash, as this could become noticeable in OLTP workloads with loads
of short statements and BEGIN/COMMIT queries?

Thoughts or comments?
--
Michael

Attachment Content-Type Size
v3-0001-Rework-format-of-comments-for-nodes.patch text/x-diff 44.6 KB
v3-0002-Move-query-jumble-code-to-src-backend-nodes.patch text/x-diff 7.1 KB
v3-0003-Support-for-automated-query-jumble-with-all-Nodes.patch text/x-diff 80.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-01-18 07:06:17 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Nitin Jadhav 2023-01-18 07:01:35 Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl