| From: | Sami Imseih <samimseih(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | zengman <zengman(at)halodbtech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Refactor query normalization into core query jumbling |
| Date: | 2025-12-24 17:32:45 |
| Message-ID: | CAA5RZ0sbWmqdUBFo8JXMJe72pnwjxVY58htJ6pKbwnyQuRctQw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> > For the second part of making more jumbling utilities global, this can
> > be taken up in another thread. I am now thinking we would be better
> > off actually taking all the generic jumbling routines and separating
> > them into their own C file. So we can have a new file like primjumble.c
> > which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
> > can worry about its core business of jumbling a Query tree. Anyhow,
> > these are just some high level thoughts on this part for now.
>
> Hmm. Moving from pgss to the core backup the code that updates the
> clocations of an existing JumbleState does not solve the issue that
> we should not modify the internals of the JumbleState computed. So
> this does not move the ball at all.
This is for the point "I am wondering if we should not expose a bit more
the jumble query APIs" [0], which as I mention is a separate topic and is
not about extensions being able to modify JumbleState.
> The updates of clocations depend on GenerateNormalizedQuery(), which
> depends on the pre-work done in CleanQuerytext() to get a query string
> and its location for a multi-query string case. If we really want to
> make a JumbleState not touchable at all, we could either push more
> work of CleanQuerytext() and GenerateNormalizedQuery() into core.
When you say “push more work into core,” I understand that to mean this work
should occur during jumbling. If so, there are several complications.
1/ CleanQueryText() requires access to the query text, which we do not have
during core jumbling as of 2ecbb0a4935.
2/ GenerateNormalizedQuery() should be invoked on demand by the extension
that needs the normalized string, for example when pg_stat_statements needs
to store a query string.
Both operations are expensive, especially GenerateNormalizedQuery(). Doing
this work on every JumbleQuery() would introduce unnecessary overhead.
> Or a second thing that we can do, which would be actually much
> simpler, is to rework entirely fill_in_constant_lengths() so as we
> generate a *copy* of the location data PGSS wants rather than force it
> into a JumbleState
yes, that's an idea that did cross my mind. I have hesitation of copying
clocations, but that may not be such a big deal, since it will only be
occurring on demand when the extension requests a normalized string.
> that would be pushed across more stacks of the
> post-parse-analyze hook. Doing that would allow us to pull the plug
> into making JumbleState and the original copies of clocations a set of
> consts when given to extensions.
Yes correct. The hook signature will turn into, so all extensions now
just have a constant pointer to the jumblestate. They can of course
work hard to cast away constants, but at that point, they are doing
things the wrong way.
```
typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
Query *query,
const JumbleState *jstate);
```
This of course will be a big breaking change to all the extensions out
there using this hook. This is not just about a signature change of
the hook, but an author now has to figure out how to deal with a
constant JumbleState in their code, which may not be trivial.
My proposal in v3 was a bit more softer, and keeps the hook
backwards compatible. Basically, we keep JumbleState a non-constant,
but provide core APIs for any operation, such as generate_normalized_query,
that needs to modify the state. So, my approach was not about enforcing a
read-only JumbleState, but about providing the API to dissuade an author
from modifying a JumbleState.
If we need a stronger API contract, then we can go with the hook receving
JumbleState as a constant and implement the copy of clocations to return
back to extensions that need to normalize a query. We also have to keep
in mind that if there are future requirements where the state has to be
modified by an extension, we will need to implement more copy functions
for those field.
[0] https://www.postgresql.org/message-id/aUXeTEpSymo6n7lF%40paquier.xyz
--
Sami Imseih
Amazon Web Services (AWS)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2025-12-24 17:55:29 | Re: should we have a fast-path planning for OLTP starjoins? |
| Previous Message | Robert Treat | 2025-12-24 16:50:48 | Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format |