Re: pg_stat_statements and "IN" conditions

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Pavel Trukhanov <pavel(dot)trukhanov(at)gmail(dot)com>
Subject: Re: pg_stat_statements and "IN" conditions
Date: 2020-12-26 10:46:35
Message-ID: 20201226104635.uvzkftzu72zg5qux@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:
> > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:
> >
> > I would like to start another thread to follow up on [1], mostly to bump up the
> > topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in
> > queries like:
> >
> > SELECT something FROM table WHERE col IN (1, 2, 3, ...)
> >
> > The current implementation produces different jumble hash for every different
> > number of arguments for essentially the same query. Unfortunately a lot of ORMs
> > like to generate these types of queries, which in turn leads to
> > pg_stat_statements pollution. Ideally we want to prevent this and have only one
> > record for such a query.
> >
> > As the result of [1] I've identified two highlighted approaches to improve this
> > situation:
> >
> > * Reduce the generated ArrayExpr to an array Const immediately, in cases where
> > all the inputs are Consts.
> >
> > * Make repeating Const to contribute nothing to the resulting hash.
> >
> > I've tried to prototype both approaches to find out pros/cons and be more
> > specific. Attached patches could not be considered a completed piece of work,
> > but they seem to work, mostly pass the tests and demonstrate the point. I would
> > like to get some high level input about them and ideally make it clear what is
> > the preferred solution to continue with.
>
> I've implemented the second approach mentioned above, this version was
> tested on our test clusters for some time without visible issues. Will
> create a CF item and would appreciate any feedback.

After more testing I found couple of things that could be improved,
namely in the presence of non-reducible constants one part of the query
was not copied into the normalized version, and this approach also could
be extended for Params. These are incorporated in the attached patch.

Attachment Content-Type Size
v2-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patch text/x-diff 37.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-12-26 14:25:14 Re: pgsql: Add pg_alterckey utility to change the cluster key
Previous Message Fabien COELHO 2020-12-26 10:03:00 Re: pgsql: Add pg_alterckey utility to change the cluster key