Re: Refactor query normalization into core query jumbling

From: Lukas Fittl <lukas(at)fittl(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, zengman <zengman(at)halodbtech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Subject: Re: Refactor query normalization into core query jumbling
Date: 2025-12-30 02:34:18
Message-ID: CAP53PkxqGbPw5VzpacyJb2wTofYJadCoUmxV8s2o5tHzKznwbg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 24, 2025 at 9:33 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:

> > that would be pushed across more stacks of the
> > post-parse-analyze hook. Doing that would allow us to pull the plug
> > into making JumbleState and the original copies of clocations a set of
> > consts when given to extensions.
>
> Yes correct. The hook signature will turn into, so all extensions now
> just have a constant pointer to the jumblestate. They can of course
> work hard to cast away constants, but at that point, they are doing
> things the wrong way.
>
> ```
> typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
>
> Query *query,
>
> const JumbleState *jstate);
> ```
>
> This of course will be a big breaking change to all the extensions out
> there using this hook. This is not just about a signature change of
> the hook, but an author now has to figure out how to deal with a
> constant JumbleState in their code, which may not be trivial.
>

I wonder if there is a single extension out there that actually utilizes
JumbleState in that hook -
it was added in 5fd9dfa5f50e4906c35133a414ebec5b6d518493 presumably because
pg_stat_statements
needed it, but even Julien at the time was critical of adding it, mainly
considering it for pgss needs if I read
the archive correctly [0]. CCing Julien to correct me if I misinterpret the
historic record here.

Reading through the discussion in this thread here I do wonder a bit if we
shouldn't just take that out of the
hook again, and instead provide separate functions for getting the
normalized query string (generating it the
first time its called).

My proposal in v3 was a bit more softer, and keeps the hook
> backwards compatible. Basically, we keep JumbleState a non-constant,
> but provide core APIs for any operation, such as
> generate_normalized_query,
> that needs to modify the state. So, my approach was not about enforcing a
> read-only JumbleState, but about providing the API to dissuade an author
> from modifying a JumbleState.

Given the lack of public APIs to modify JumbleState today, I don't see how
an extension would do
modifications in a meaningful way, short of copying the code. I think we
should be a bit bolder here in
enforcing a convention, either clearly making it read-only or dropping the
argument again.

Thinking through a couple of theoretical use cases:

1) An extension that wants to display normalized query strings

This seems to be the biggest kind of what I can find with code search.
Extensions like pg_stat_monitor [1], that
want to do something like pg_stat_statements, and have to copy a bunch of
normalization code today that is 1:1 what
Postgres does. Such extensions don't need the JumbleState argument if they
can get the normalized text directly.

2) An extension that wants to capture parameter values

Some extensions may want to remember additional context for normalized
queries, like pg_tracing's logic for
optionally passing parameter values (post normalization) in the trace
context [2]. If we kept passing a read-only
JumbleState then such extensions could presumably still get this, but I
wonder if it wouldn't be better for core to
have a helper for this?

3) An extension that modifies the JumbleState to cause different
normalization to happen

I'm not aware of any extensions doing this, and I couldn't find any
either. And since such theoretical extensions
can directly modify query->queryId in the same hook, why not do that?

4) Extensions using the presence of JumbleState to determine whether query
IDs are available (?)

I noticed pg_hint_plan checks the JumbleState argument to determine whether
or not to run an early check
of the hint logic. I'm a little confused by this (and if this is about
query IDs being available, couldn't it just check the
Query having a non-zero queryID?), but FWIW: [3]

[0]:
https://www.postgresql.org/message-id/flat/CAOBaU_ZbeQrUESCuLGk3sRZWxXFGaBBO39CxSsFxLeZGicUrJw%40mail.gmail.com#9a9bc47aa1b4b25ee2e3de1388c4c346
[1]:
https://github.com/percona/pg_stat_monitor/blob/main/pg_stat_monitor.c#L476
[2]:
https://github.com/DataDog/pg_tracing/blob/main/src/pg_tracing_query_process.c#L428
[3]:
https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3111

Thanks,
Lukas

--
Lukas Fittl

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Henson Choi 2025-12-30 02:37:20 Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Previous Message Tom Lane 2025-12-30 02:24:20 Re: [PATCH] Allow complex data for GUC extra.