Re: Generating code for query jumbling through gen_node_support.pl

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Subject: Re: Generating code for query jumbling through gen_node_support.pl
Date: 2023-01-24 06:57:56
Message-ID: Y8+BdCOjxykre5es@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 23, 2023 at 02:27:13PM +0100, Peter Eisentraut wrote:
> A couple of small fixes are attached.

Thanks.

> There is something weird in _jumbleNode(). There are two switch
> (nodeTag(expr)) statements. Maybe that's intentional, but then it should be
> commented better, because now it looks more like an editing mistake.

This one is intentional, so as it is possible to track correctly the
highest param ID found while browsing the nodes. IMO it would be
confusing to add that into gen_node_support.pl. Another thing that
could be done is to switch Param to have a custom implementation, like
RangeTblEntry, though this removes the automation around the creation
of _jumbleParam(). I have clarified the comments around that.

> The handling of T_RangeTblEntry could be improved. In other contexts we
> have for example a custom_copy attribute, which generates the switch entry
> but not the function. Something like this could be useful here too.

Hmm. Okay. Fine by me.

> Otherwise, this looks ok. I haven't checked whether it maintains the exact
> behavior from before. What is the test coverage situation for this?

0003 taken in isolation has some minimal coverage through
pg_stat_statements, though it turns around 15% with compute_query_id =
auto that would enforce the jumbling path only when pg_stat_statements
uses it. Still, my plan here is to enforce the loading of
pg_stat_statements with compute_query_id = regress and
utility_query_id = jumble (if needed) in a new buildfarm machine,
because that's the cheapest path. An extra possibility is to have
pg_regress kicked in a new TAP test with these settings, but that's
costly and we have already two of these :/ Another possibility is to
plug in that into 027_stream_regress or the pg_upgrade test suite with
new settings :/

Anyway, the regression tests of pg_stat_statements should be extended
a bit to cover more node types by default (Say COPY with DMLs for the
InsertStmt & co) to look at how these are showing up once normalized
using their parsed query, and we don't do much around that now.
Normalizing more DDLs should use this path, as well.

> For the 0004 patch, it should be documented why one would want one behavior
> or the other. That's totally unclear right now.

I am not 100% sure whether we should have this part at the end, but as
an exit path in case somebody complains about the extra load with the
automated jumbling compared to the hash of the query strings, that can
be used as a backup. Anyway, attached is what would be a
clarification.

> I think if we are going to accept 0004, then it might be better to combine
> it with 0003. Otherwise, 0004 is just undoing a lot of the code structure
> changes in JumbleQuery() that 0003 did.

Makes sense. That would be my intention if 0004 is the most
acceptable and splitting things makes things a bit easier to review.
--
Michael

Attachment Content-Type Size
v6-0003-Support-for-automated-query-jumble-with-all-Nodes.patch text/x-diff 68.5 KB
v6-0004-Add-GUC-utility_query_id.patch text/x-diff 12.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-01-24 06:57:58 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Tom Lane 2023-01-24 06:38:09 Re: Mutable CHECK constraints?