Re: Generating code for query jumbling through gen_node_support.pl

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Subject: Re: Generating code for query jumbling through gen_node_support.pl
Date: 2023-01-13 07:54:58
Message-ID: Y8EOUuViEL8Ewvf/@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 07, 2023 at 07:37:49AM +0100, Peter Eisentraut wrote:
> On 07.12.22 08:56, Michael Paquier wrote:
>> The location of the Nodes is quite invasive because we only care about
>> that for T_Const now in the query jumbling, and this could be
>> compensated with a third pg_node_attr() that tracks for the "int
>> location" of a Node whether it should participate in the jumbling or
>> not.
>
> The generation script already has a way to identify location fields, by ($t
> eq 'int' && $f =~ 'location$'), so you could use that as well.

I did not recall exactly everything here, but there are two parts to
the logic:
- gen_node_support.pl uses exactly this condition when scanning the
nodes to put the correct macro to mark a location to track, calling
down RecordConstLocation().
- Marking a bunch of nodes as jumble_ignore is actually necessary, or
we may finish by silencing parts of queries that should be
semantically unrelevant to the queries jumbled (ColumnRef is one).
Using a "jumble_ignore" flag is equally invasive to using an
"jumble_include" flag for each field, I am afraid, as the number of
fields in the nodes included in jumbles is pretty equivalent to the
number of fields ignored. I tend to prefer the approach of ignoring
things though, which is more consistent with the practive for node
read, write and copy.

Anyway, when it comes to the location, another thing that can be
considered here would be to require a node-level flag for the nodes on
which we want to track the location. This overlaps a bit with the
fields satisfying "($t eq 'int' && $f =~ 'location$')", but it removes
most of the code changes like this one as at the end we only care
about the location for Const nodes:
- int location; /* token location, or -1 if unknown */
+ int location pg_node_attr(jumble_ignore); /* token location, or -1
+ * if unknown */

I have taken this approach in v2 of the patch, shaving ~100 lines of
more code as there is no need to mark all these location fields with
"jumble_ignore" anymore, except for Const, of course.

>> There is also an argument where we would want to not include by
>> default new fields added to a Node, but that would require added more
>> pg_node_attr() than what's done here.
>
> I'm concerned about the large number of additional field annotations this
> adds. We have been careful so far to document the use of each attribute,
> e.g., *why* does a field not need to be copied etc. This patch adds dozens
> and dozens of annotations without any explanation at all. Now, the code
> this replaces also has no documentation, but maybe this is the time to add
> some.

That's fair, though it is not doing to buy us much to update all the
nodes with similar small comments, as well. As far as I know, there
are basiscally three things here: typmods, collation information, and
internal data of the nodes stored during parse-analyze. I have added
more documentation to track what looks like the most relevant areas.

I have begun running some performance tests with this stuff and HEAD
to see if this leads to any difference in the query ID compilation
(compute_query_id = on, on scissors roughly) with a simple set of
short commands (like BEGIN/COMMIT) or longer ones, and I am seeing a
speedup trend actually (?). I still need to think more about a set of
tests here, but I think that micro-benchmarking of JumbleQuery() is
the most adapted approach to minimize the noise, with a few nodes of
various sizes (Const, Query, ColumnRef, anything..).

Thoughts?
--
Michael

Attachment Content-Type Size
v2-0001-Support-for-automated-query-jumble-with-all-Nodes.patch text/x-diff 85.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2023-01-13 08:11:06 Re: Sampling-based timing for EXPLAIN ANALYZE
Previous Message Jeff Davis 2023-01-13 07:38:50 Re: Blocking execution of SECURITY INVOKER