Re: Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
Date: 2015-11-24 19:29:53
Message-ID: 27217.1448393393@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Tue, Dec 10, 2013 at 1:30 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> pg_stat_statements' fingerprinting logic considers the following two
>> statements as distinct:
>>
>> select 1 in (1, 2, 3);
>> select 1 in (1, 2, 3, 4);
>>
>> This is because the ArrayExpr jumble case jumbles any ArrayExpr's list
>> of elements recursively. In this case it's a list of Const nodes, and
>> the fingerprinting logic jumbles those nodes indifferently.

I think this is a vastly oversimplified explanation of the problem.
In particular, because the planner will flatten an ArrayExpr containing
only Const nodes to an array constant (see eval_const_expressions),
I don't believe the case ever arises in exactly the form you posit here.

A portion of the problem is possibly due to the heuristics in
parse_expr.c's transformAExprIn():

* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if there is a suitable array type available. If not, we fall
* back to a boolean condition tree with multiple copies of the lefthand
* expression. Also, any IN-list items that contain Vars are handled as
* separate boolean conditions, because that gives the planner more scope
* for optimization on such clauses.

If the original text actually involves a variable number of Vars, then you
will end up with a boolean expression with a varying number of OR arms,
even if the Vars later get flattened to constants. However, it's not
clear to me that anyone would expect such cases to be treated as
identical. Another possibility is a type clash, for example
"x IN (42, 44.1)" will end up as a boolean tree for lack of a common
type for the would-be array elements. That case might possibly be an
issue in practice.

But what seems more likely to be annoying people is cases in which the
original text contains a varying number of Param markers. Those might or
might not get folded to constants during planning depending on context,
so that they might or might not look different to pg_stat_statements.

So I suspect the real problem here is that we might want all of these
things to look identical to pg_stat_statements:

ARRAY[$1, $2, 42]
ARRAY[$1, $2, $3, 47]
'{1,2,3,47}'::int[]

Don't see a very clean way to do that ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-11-24 19:43:30 Re: New email address
Previous Message Larry Rosenman 2015-11-24 19:20:22 Re: New email address