Re: pg_stat_statements and "IN" conditions

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, David Geier <geidav(dot)pg(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, vignesh C <vignesh21(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Pavel Trukhanov <pavel(dot)trukhanov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg_stat_statements and "IN" conditions
Date: 2023-10-27 15:02:44
Message-ID: 20231027150244.jmdvn23l4rpx6hha@erthalion.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, Oct 26, 2023 at 09:08:42AM +0900, Michael Paquier wrote:
> typedef struct ArrayExpr
> {
> + pg_node_attr(custom_query_jumble)
> +
>
> Hmm. I am not sure that this is the best approach
> implementation-wise. Wouldn't it be better to invent a new
> pg_node_attr (these can include parameters as well!), say
> query_jumble_merge or query_jumble_agg_location that aggregates all
> the parameters of a list to be considered as a single element. To put
> it short, we could also apply the same property to other parts of a
> parsed tree, and not only an ArrayExpr's list.

Sounds like an interesting idea, something like:

typedef struct ArrayExpr
{
...
List *elements pg_node_attr(query_jumble_merge);

to replace simple JUMBLE_NODE(elements) with more elaborated logic.

> /* GUC parameters */
> extern PGDLLIMPORT int compute_query_id;
> -
> +extern PGDLLIMPORT bool query_id_const_merge;
>
> Not much a fan of this addition as well for an in-core GUC. I would
> suggest pushing the GUC layer to pg_stat_statements, maintaining the
> computation method to use as a field of JumbleState as I suspect that
> this is something we should not enforce system-wide, but at
> extension-level instead.

I also do not particularly like an extra GUC here, but as far as I can
tell to make it pg_stat_statements GUC only it has to be something
similar to EnableQueryId (e.g. EnableQueryConstMerging), that will be
called from pgss. Does this sound better?

> + /*
> + * Indicates the constant represents the beginning or the end of a merged
> + * constants interval.
> + */
> + bool merged;
>
> Not sure that this is the best thing to do either. Instead of this
> extra boolean flag, could it be simpler if we switch LocationLen so as
> we track the start position and the end position of a constant in a
> query string, so as we'd use one LocationLen for a whole set of Const
> nodes in an ArrayExpr? Perhaps this could just be a refactoring piece
> of its own?

Sounds interesting as well, but it seems to me there is a catch. I'll
try to elaborate, bear with me:

* if the start and the end positions of a constant means the first and the
last character representing it, we need the textual length of the
constant in the query to be able to construct such a LocationLen. The
lengths are calculated in pg_stat_statements later, not in JumbleQuery,
and it uses parser for that. Doing all of this in JumbleQuery doesn't
sound reasonable to me.

* if instead we talk about the start and the end positions in a
set of constants, that would mean locations of the first and the last
constants in the set, and everything seems fine. But for such
LocationLen to represent a single constant (not a set of constants), it
means that only the start position would be meaningful, the end position
will not be used.

The second approach is somewhat close to be simpler than the merge flag,
but assumes the ugliness for a single constant. What do you think about
this?

> + /*
> + * If the first expression is a constant, verify if the following elements
> + * are constants as well. If yes, the list is eligible for merging, and the
> + * order of magnitude need to be calculated.
> + */
> + if (IsA(firstExpr, Const))
> + {
> + foreach(temp, elements)
> + if (!IsA(lfirst(temp), Const))
> + return false;
>
> This path should be benchmarked, IMO.

I can do some benchmarking here, but of course it's going to be slower
than the baseline. The main idea behind the patch is to trade this
overhead for the benefits in the future while processing pgss records,
hoping that it's going to be worth it (and in those extreme cases I'm
trying to address it's definitely worth it).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-10-27 15:13:43 Re: Synchronizing slots from primary to standby
Previous Message Victor Wagner 2023-10-27 15:00:51 Re: Enderbury Island disappeared from timezone database