Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2021-09-01 16:45:29
Message-ID: 6bc0ab0c-a807-2b7f-7df8-ac90c7839013@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 8/24/21 3:13 PM, Justin Pryzby wrote:
> On Mon, Aug 16, 2021 at 05:41:57PM +0200, Tomas Vondra wrote:
>>> This still seems odd:
>>>
>>> postgres=# CREATE STATISTICS asf ON i FROM t;
>>> ERROR: extended statistics require at least 2 columns
>>> postgres=# CREATE STATISTICS asf ON (i) FROM t;
>>> CREATE STATISTICS
>>>
>>> It seems wrong that the command works with added parens, but builds expression
>>> stats on a simple column (which is redundant with what analyze does without
>>> extended stats).
>>
>> Well, yeah. But I think this is a behavior that was discussed somewhere in
>> this thread, and the agreement was that it's not worth the complexity, as
>> this comment explains
>>
>> * XXX We do only the bare minimum to separate simple attribute and
>> * complex expressions - for example "(a)" will be treated as a complex
>> * expression. No matter how elaborate the check is, there'll always be
>> * a way around it, if the user is determined (consider e.g. "(a+0)"),
>> * so it's not worth protecting against it.
>>
>> Patch 0001 fixes the "double parens" issue discussed elsewhere in this
>> thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple
>> column reference.
>
> 0002 refuses to create expressional stats on a simple column reference like
> (a), which I think is helps to avoid a user accidentally creating useless ext
> stats objects (which are redundant with the table's column stats).
>
> 0002 does not attempt to refuse cases like (a+0), which I think is fine:
> we don't try to reject useless cases if someone insists on it.
> See 240971675, 701fd0bbc.
>
> So I am +1 to apply both patches.
>
> I added this as an Opened Item for increased visibility.
>

I've pushed both fixes, so the open item should be resolved.

However while polishing the second patch, I realized we're allowing
statistics on expressions referencing system attributes. So this fails;

CREATE STATISTICS s ON ctid, x FROM t;

but this passes:

CREATE STATISTICS s ON (ctid::text), x FROM t;

IMO we should reject such expressions, just like we reject direct
references to system attributes - patch attached.

Furthermore, I wonder if we should reject expressions without any Vars?
This works now:

CREATE STATISTICS s ON (11:text) FROM t;

but it seems rather silly / useless, so maybe we should reject it.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Reject-extended-statistics-referencing-system-attrib.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2021-09-01 16:51:42 Re: 2021-09 Commitfest
Previous Message Tom Lane 2021-09-01 16:41:20 Re: Is it safe to use the extended protocol with COPY?