Re: Refactor query normalization into core query jumbling

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, zengman <zengman(at)halodbtech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Subject: Re: Refactor query normalization into core query jumbling
Date: 2026-03-26 03:31:28
Message-ID: CAA5RZ0tzLhGxR3cCQtPs1=HeGWh0WDDBC1KDTgOh9x9u2gvy1Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for looking!

> I don't think this is needed anymore, as of
> 45762084545ec14dbbe66ace1d69d7e89f8978ac.

Correct. That showed up after my last rebase.

> > +/*
> > + * Callback to generate a normalized version of the query string that will be used to
> > + * represent all similar queries.
> > + *
>
> I don't think the term "Callback" makes sense here - I think you could
> just keep the original wording.

This was a remnant of my earlier experimentation. I removed.

A few notes on the comments:

- * Note that the normalized representation may well vary depending on
- * just which "equivalent" query is used to create the hashtable entry.
- * We assume this is OK.

This was in the original generate_normalize_query and since it mentions
hashtable entry, I moved the comment (in-spirit) to where pg_stat_statements
calls GenerateNormailzeQuery

* If query_loc > 0, then "query" has been advanced by that much compared to
* the original string start, as is the case with multi-statement strings, so
* we need to translate the provided locations to compensate. (This lets us
* avoid re-scanning statements before the one of interest, so it's worth doing.)
*
This comment was originally duplicated in both SetConstantLengths, so I
just kept it as-is in SetConstantLengths and added a shorter reference in
GenerateNormalizeQuery

Also, this comment "It is the caller's job to ensure that the string
is a valid SQL statement..."
made more sense in GenerateNormalizeQuery rather than SetConstantLengths, since
GenerateNormalizeQuery is the public function.

> In 0002:
> You could use palloc_array for locs here.

done.

> I think we should update the comment here to reflect the fact that
> we're no longer modifying JumbleState.

done.

> Otherwise these patches look good - it'd be nice to still get this
> into 19 so we have less code duplication across the different
> extensions working with normalized query text.

I agree!

v6 addresses the comments.

--
Sami Imseih
Amazon Web Services (AWS)

Attachment Content-Type Size
v6-0002-Make-JumbleState-const-in-post_parse_analyze-hook.patch application/octet-stream 8.6 KB
v6-0001-pg_stat_statements-Move-query-normalization-to-co.patch application/octet-stream 21.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Pang 2026-03-26 04:05:34 Re: [PATCH] Fix premature timeout in pg_promote() caused by signal interruptions
Previous Message Fujii Masao 2026-03-26 03:18:25 Re: Fix how some lists are displayed by psql \d+