| From: | Sami Imseih <samimseih(at)gmail(dot)com> |
|---|---|
| To: | zengman <zengman(at)halodbtech(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Refactor query normalization into core query jumbling |
| Date: | 2025-12-23 16:35:18 |
| Message-ID: | CAA5RZ0uLS9RrpO2roX7p3EHE4-VJkBsGAB970jrbo1-GRDAi0g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> Though this may be tangential to the current topic, I have long been wanting to revise the two instances of `Assert(len_to_wrt >= 0);`
> in the code to the implementation below. Would you kindly advise if this modification is worthwhile?
>
> ```
> if (len_to_wrt > 0)
> {
> memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
> n_quer_loc += len_to_wrt;
> }
> ```
I am not sure I like this idea. The len_to_wrt being less than 0
indicates a bug which will be hidden behind such a loop. I
think it's better to keep the Assert as-is.
> > > This way, any extension that wishes to return a normalized string from
> > > the same JumbleState can invoke this callback and get consistent results.
> > > pg_stat_statements and other extensions with a need to normalize a query
> > > string based on the locations of a JumbleState do not need to care about the
> > > internals of normalization, they simply invoke the callback and
> > > receive the final
> > > string.
> >
> > Hmm. I did not wrap completely my head with your problem, but,
> > assuming that what you are proposing goes in the right direction,
>
> The first goal is to move all query-normalization-related infrastructure
> that pg_stat_statements (and other extensions) rely on into core, so
> extensions no longer need to copy or reimplement normalization logic and
> can all depend on a single, shared implementation.
>
> In addition, query normalization necessarily modifies JumbleState (to
> record constant locations and lengths). This responsibility should not
> fall to extensions and should instead be delegated to core. I will argue
> that the current design, in which extensions handle this directly, is a
> layering violation.
>
> As a first step, we can move generate_normalized_query to core as a global
> function, allowing extensions to simply call it.
v3 implements this approach without a callback. This establishes a clear
boundary: core owns JumbleState modifications, extensions consume the
results through the API.
I kept the post_parse_analyze hook signature unchanged since
GenerateNormalizedQuery still needs to modify JumbleState
(to populate constant lengths). Making it const would be misleading.
For the second part of making more jumbling utilities global, this can
be taken up in another thread. I am now thinking we would be better
off actually taking all the generic jumbling routines and separating
them into their own C file. So we can have a new file like primjumble.c
which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
can worry about its core business of jumbling a Query tree. Anyhow,
these are just some high level thoughts on this part for now.
--
Sami Imseih
Amazon Web Services (AWS)
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-pg_stat_statements-Move-query-normalization-to-co.patch | application/octet-stream | 20.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Paul A Jungwirth | 2025-12-23 18:08:19 | Re: domain for WITHOUT OVERLAPS |
| Previous Message | Marcos Pegoraro | 2025-12-23 15:58:47 | Re: Get rid of "Section.N.N.N" on DOCs |