Re: queryId constant squashing does not support prepared statements

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: queryId constant squashing does not support prepared statements
Date: 2025-05-28 03:44:49
Message-ID: aDaGsZPYhmoapjwK@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 27, 2025 at 05:05:39PM -0500, Sami Imseih wrote:
> * 0001:
>
> This is a normalization issue discovered when adding new
> tests for squashing. This is also an issue that exists in
> v17 and likely earlier versions and should probably be
> backpatched.
>
> The crux of the problem is if a constant location is
> recorded multiple times, the values for $n don't take
> into account the duplicate constant locations and end up
> incorrectly incrementing the next value from $n.
>
> This does also feel like it should be backpatched.

Yes, this needs to be backpatched and it is actually a safe backpatch
because only the text representation is changed when adding a new
entry in the dshash; it only improves the reports without touching the
existing data. I'm OK to take care of this one by myself, even in the
context of this thread. It is an issue independent of what we're
discussing here for the list squashing. As there is only one sprintf() in
generate_normalized_query() in ~17, the fix of the back-branches is
slightly simpler.

You have mentioned the addition of tests, but v6-0001 includes nothing
of the kind. Am I missing something? How much coverage did you
intend to add here? These seem to be included in squashing.sql in
patch v6-0002, but IMO this should be moved somewhere else to work
with the back-branches and make the whole backpatch story more
consistent.

> * 0002:
>
> Added some more tests to the ones initially proposed
> by Dmitri in v3-0001 [0] including the "edge cases" which
> led to the findings for 0001.

Tests for CoerceViaIO with jsonb have been moved around. Not a big
deal, but that makes the diffs of the patch confusing to read.

+-- if there is only one level of relabeltype, the list will be squashable

RelabelType perhaps?

A lot of the tests introduced in v6-0002 are copy-pastes of the
previous ones for IN clauses introduced for the ARRAY cases, with
comments explaining the reasons why lists are squashed or not also
copy-pasted. Perhaps it would make sense to group the ARRAY and IN
clause cases together. For example, group each of the two CoerceViaIO
cases together in a single query on pg_stat_statements, with a single
pg_stat_statements_reset(). That would make more difficult to miss
the fact that we need to care about IN clauses *and* arrays when
adding more test patterns, if we add some of course.

The cases where IN clauses are rewritten as ArrayExpr are OK kept at
the end.

> * 0003:
>
> This fixes the normalization anomalies introduced by
> 62d712ec ( squashing feature ) mentioned here [1]
>
> This patch therefore implements the fixes to track
> the boundaries of an IN-list, Array expression.

Nice simplifications in the PGSS part in terms of

+ ListWithBoundary *n = $4;

I'd suggest to not use "n" for this one, but a different variable
name, leaving the internals for the SubLink cases minimally touched.

+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;

Implementation-wise, I would choose a location with a query length
rather than start and end locations. That's what we do for the nested
queries in the DMLs, so on consistency grounds..

> * 0004: implements external parameter squashing.

+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
[...]
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ RecordExpressionLocation(jstate, expr->location, -1, true);
+
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }

Using a custom implementation for Param nodes means that we are going
to apply a location record for all external parameters, not only the
ones in the lists.. Not sure if this is a good idea. Something
smells a bit wrong with this approach. Sorry, I cannot push my finger
on what exactly when typing this paragraph.

> While I think we should get all patches in for v18, I definitely
> think we need to get the first 3 because they fix existing
> bugs.
>
> What do you think?

Patches 0002 and 0003 fix bugs in the squashing logic present only on
HEAD, nothing that impacts older branches already released, right?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2025-05-28 03:49:49 Re: Speed up JSON escape processing with SIMD plus other optimisations
Previous Message David Rowley 2025-05-28 03:18:18 Re: Speed up JSON escape processing with SIMD plus other optimisations