Re: Refactor query normalization into core query jumbling

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactor query normalization into core query jumbling
Date: 2025-12-22 23:40:10
Message-ID: CAA5RZ0tKhUXQcyqOqKaBXfmjMZnYVkx44=3DHneomRuBBsZ4bA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > This way, any extension that wishes to return a normalized string from
> > the same JumbleState can invoke this callback and get consistent results.
> > pg_stat_statements and other extensions with a need to normalize a query
> > string based on the locations of a JumbleState do not need to care about the
> > internals of normalization, they simply invoke the callback and
> > receive the final
> > string.
>
> Hmm. I did not wrap completely my head with your problem, but,
> assuming that what you are proposing goes in the right direction,

The first goal is to move all query-normalization-related infrastructure
that pg_stat_statements (and other extensions) rely on into core, so
extensions no longer need to copy or reimplement normalization logic and
can all depend on a single, shared implementation.

In addition, query normalization necessarily modifies JumbleState (to
record constant locations and lengths). This responsibility should not
fall to extensions and should instead be delegated to core. I will argue
that the current design, in which extensions handle this directly, is a
layering violation.

As a first step, we can move generate_normalized_query to core as a global
function, allowing extensions to simply call it.

> I am wondering if we should not expose a bit more the jumble query APIs so
> as the normal default callback can be reused by out-of-core rather
> than hide it entirely. This would mean exposing
> GenerateNormalizedQuery(), which also giving a way for callers of
> JumbleQuery() to pass down a custom callback? This would imply
> thinking harder about the initialization state we expect in the
> structure, but I think that we should try to design things so as
> extensions do not need to copy-paste more code from the core tree at
> the end, just less of it.

... and this will be taking the next step which is providing callbacks
and making
more jumbling utilities global. This will require more discussion, but I
would think we would expose InitJumble() and it will do the bare minimum
to initialize a JumbleState, and some fields that can define callbacks after
the fact. There will be a callback for a normalization function and a
callback function that will allow the user to implement jumbling functions
for nodes that are currently not included in queryjumblefuncs.switch.c, or
perhaps they can override the existing logic in this generated file.

> Of course, this sentence is written with the same line of thoughts as
> previously mentioned in the other thread we have discussed: extensions
> should not be allowed to update a JumbleState after it's been set by
> the backend code, so as once the same JumbleState pointer is passed
> down across multiple extensions they don't get confused. If an
> extension wants to use their own policy within the JumbleState, they
> had better recreate a new independent one if they are unhappy about
> has been generated previously.

Yes, correct. If we provide the interface to create an additional JumbleState,
they can create an independent state.

For this thread, I would like to focus on the first goal.

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2025-12-22 23:49:09 Re: Postgres Patch Review Workshop: January 2026
Previous Message Masahiko Sawada 2025-12-22 23:24:40 Re: [PATCH] Add memory usage reporting to VACUUM VERBOSE