| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Sami Imseih <samimseih(at)gmail(dot)com> |
| Cc: | zengman <zengman(at)halodbtech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Refactor query normalization into core query jumbling |
| Date: | 2025-12-24 04:06:59 |
| Message-ID: | aUtm47pcAhZ1Ix0B@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote:
> > Though this may be tangential to the current topic, I have long been wanting to revise the two instances of `Assert(len_to_wrt >= 0);`
> > in the code to the implementation below. Would you kindly advise if this modification is worthwhile?
> >
> > ```
> > if (len_to_wrt > 0)
> > {
> > memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
> > n_quer_loc += len_to_wrt;
> > }
> > ```
>
> I am not sure I like this idea. The len_to_wrt being less than 0
> indicates a bug which will be hidden behind such a loop. I
> think it's better to keep the Assert as-is.
This set of asserts are critical to keep. An incorrect computation is
a synonym of an out-of-bound write in this case, which would classify
any problem as something that could be worth a CVE.
> v3 implements this approach without a callback. This establishes a clear
> boundary: core owns JumbleState modifications, extensions consume the
> results through the API.
>
> I kept the post_parse_analyze hook signature unchanged since
> GenerateNormalizedQuery still needs to modify JumbleState
> (to populate constant lengths). Making it const would be misleading.
>
> For the second part of making more jumbling utilities global, this can
> be taken up in another thread. I am now thinking we would be better
> off actually taking all the generic jumbling routines and separating
> them into their own C file. So we can have a new file like primjumble.c
> which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
> can worry about its core business of jumbling a Query tree. Anyhow,
> these are just some high level thoughts on this part for now.
Hmm. Moving from pgss to the core backup the code that updates the
clocations of an existing JumbleState does not solve the issue that
we should not modify the internals of the JumbleState computed. So
this does not move the ball at all.
The updates of clocations depend on GenerateNormalizedQuery(), which
depends on the pre-work done in CleanQuerytext() to get a query string
and its location for a multi-query string case. If we really want to
make a JumbleState not touchable at all, we could either push more
work of CleanQuerytext() and GenerateNormalizedQuery() into core.
Or a second thing that we can do, which would be actually much
simpler, is to rework entirely fill_in_constant_lengths() so as we
generate a *copy* of the location data PGSS wants rather than force it
into a JumbleState that would be pushed across more stacks of the
post-parse-analyze hook. Doing that would allow us to pull the plug
into making JumbleState and the original copies of clocations a set of
consts when given to extensions.
--
Michael
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Peter Smith | 2025-12-24 03:12:31 | Re: Skipping schema changes in publication |