Re: pg_stat_statements and "IN" conditions

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Pavel Trukhanov <pavel(dot)trukhanov(at)gmail(dot)com>
Subject: Re: pg_stat_statements and "IN" conditions
Date: 2020-12-26 16:53:28
Message-ID: CALNJ-vQx7u-Nq_wVun72CVrHe4DcGfEurj_ikckJrRy2Pem=JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
A few comments.

+ "After this number of duplicating constants
start to merge them.",

duplicating -> duplicate

+ foreach(lc, (List *) expr)
+ {
+ Node * subExpr = (Node *) lfirst(lc);
+
+ if (!IsA(subExpr, Const))
+ {
+ allConst = false;
+ break;
+ }
+ }

It seems the above foreach loop (within foreach(temp, (List *) node)) can
be preceded with a check that allConst is true. Otherwise the loop can be
skipped.

+ if (currentExprIdx == pgss_merge_threshold - 1)
+ {
+ JumbleExpr(jstate, expr);
+
+ /*
+ * A const expr is already found, so JumbleExpr must
+ * record it. Mark it as merged, it will be the
first
+ * merged but still present in the statement query.
+ */
+ Assert(jstate->clocations_count > 0);
+ jstate->clocations[jstate->clocations_count -
1].merged = true;
+ currentExprIdx++;
+ }

The above snippet occurs a few times. Maybe extract into a helper method.

Cheers

On Sat, Dec 26, 2020 at 2:45 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:

> > On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:
> > > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:
> > >
> > > I would like to start another thread to follow up on [1], mostly to
> bump up the
> > > topic. Just to remind, it's about how pg_stat_statements jumbling
> ArrayExpr in
> > > queries like:
> > >
> > > SELECT something FROM table WHERE col IN (1, 2, 3, ...)
> > >
> > > The current implementation produces different jumble hash for every
> different
> > > number of arguments for essentially the same query. Unfortunately a
> lot of ORMs
> > > like to generate these types of queries, which in turn leads to
> > > pg_stat_statements pollution. Ideally we want to prevent this and have
> only one
> > > record for such a query.
> > >
> > > As the result of [1] I've identified two highlighted approaches to
> improve this
> > > situation:
> > >
> > > * Reduce the generated ArrayExpr to an array Const immediately, in
> cases where
> > > all the inputs are Consts.
> > >
> > > * Make repeating Const to contribute nothing to the resulting hash.
> > >
> > > I've tried to prototype both approaches to find out pros/cons and be
> more
> > > specific. Attached patches could not be considered a completed piece
> of work,
> > > but they seem to work, mostly pass the tests and demonstrate the
> point. I would
> > > like to get some high level input about them and ideally make it clear
> what is
> > > the preferred solution to continue with.
> >
> > I've implemented the second approach mentioned above, this version was
> > tested on our test clusters for some time without visible issues. Will
> > create a CF item and would appreciate any feedback.
>
> After more testing I found couple of things that could be improved,
> namely in the presence of non-reducible constants one part of the query
> was not copied into the normalized version, and this approach also could
> be extended for Params. These are incorporated in the attached patch.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-12-26 17:18:18 Re: pgsql: Add pg_alterckey utility to change the cluster key
Previous Message Tom Lane 2020-12-26 16:45:41 Re: pgsql: Add pg_alterckey utility to change the cluster key