Re: pg_stat_statements and "IN" conditions

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, vignesh C <vignesh21(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Pavel Trukhanov <pavel(dot)trukhanov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: pg_stat_statements and "IN" conditions
Date: 2023-02-17 15:46:43
Message-ID: 20230217154643.2n3atmjnenchtgud@erthalion.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, Feb 09, 2023 at 08:43:29PM +0100, Dmitry Dolgov wrote:
> > On Thu, Feb 09, 2023 at 06:26:51PM +0100, Alvaro Herrera wrote:
> > On 2023-Feb-09, Dmitry Dolgov wrote:
> >
> > > > On Thu, Feb 09, 2023 at 02:30:34PM +0100, Peter Eisentraut wrote:
> >
> > > > What is the point of making this a numeric setting? Either you want
> > > > to merge all values or you don't want to merge any values.
> > >
> > > At least in theory the definition of "too many constants" is different
> > > for different use cases and I see allowing to configure it as a way of
> > > reducing the level of surprise here.
> >
> > I was thinking about this a few days ago and I agree that we don't
> > necessarily want to make it just a boolean thing; we may want to make it
> > more complex. One trivial idea is to make it group entries in powers of
> > 10: for 0-9 elements, you get one entry, and 10-99 you get a different
> > one, and so on:
> >
> > # group everything in a single bucket
> > const_merge_threshold = true / yes / on
> >
> > # group 0-9, 10-99, 100-999, 1000-9999
> > const_merge_treshold = powers
> >
> > Ideally the value would be represented somehow in the query text. For
> > example
> >
> > query | calls
> > ----------------------------------------------------------+-------
> > select * from test where i in ({... 0-9 entries ...}) | 2
> > select * from test where i in ({... 10-99 entries ...}) | 1
> >
> > What do you think? The jumble would have to know how to reduce all
> > values within each power-of-ten group to one specific value, but I don't
> > think that should be particularly difficult.
>
> Yeah, it sounds appealing and conveniently addresses the question of
> losing the information about how many constants originally were there.
> Not sure if the example above would be the most natural way to represent
> it in the query text, but otherwise I'm going to try implementing this.
> Stay tuned.

It took me couple of evenings, here is what I've got:

* The representation is not that far away from your proposal, I've
settled on:

SELECT * FROM test_merge WHERE id IN (... [10-99 entries])

* To not reinvent the wheel, I've reused decimalLenght function from
numutils, hence one more patch to make it available to reuse.

* This approach resolves my concerns about letting people tuning
the behaviour of merging, as now it's possible to distinguish between
calls with different number of constants up to the power of 10. So
I've decided to simplify the configuration and make the guc boolean to
turn it off or on.

* To separate queries with constants falling into different ranges
(10-99, 100-999, etc), the order of magnitude is added into the jumble
hash.

* I've incorporated feedback from Sergei and David, as well as tried to
make comments and documentation more clear.

Any feedback is welcomed, thanks!

Attachment Content-Type Size
v13-0001-Reusable-decimalLength-functions.patch text/x-diff 4.8 KB
v13-0002-Prevent-jumbling-of-every-element-in-ArrayExpr.patch text/x-diff 28.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Karina Litskevich 2023-02-17 15:58:45 Re: Possible false valgrind error reports
Previous Message Dmitry Dolgov 2023-02-17 15:46:02 Re: pg_stat_statements and "IN" conditions