Re: Using multiple extended statistics for estimates

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using multiple extended statistics for estimates
Date: 2019-11-14 18:23:44
Message-ID: e41167d8-53cd-0f18-9301-85fd9a70d8d8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/14/19 7:55 AM, Tomas Vondra wrote:
> On Wed, Nov 13, 2019 at 10:04:36AM -0800, Mark Dilger wrote:
>>
>>
>> On 11/13/19 7:28 AM, Tomas Vondra wrote:
>>> Hi,
>>>
>>> here's an updated patch, with some minor tweaks based on the review and
>>> added tests (I ended up reworking those a bit, to make them more like
>>> the existing ones).
>>
>> Thanks, Tomas, for the new patch set!
>>
>> Attached are my review comments so far, in the form of a patch applied
>> on top of yours.
>>
>
> Thanks.
>
> 1) It's not clear to me why adding 'const' to the List parameters would
>   be useful? Can you explain?

When I first started reviewing the functions, I didn't know if those
lists were intended to be modified by the function. Adding 'const'
helps document that the function does not intend to change them.

> 2) I think you're right we can change find_strongest_dependency to do
>
>    /* also skip weaker dependencies when attribute count matches */
>    if (strongest->nattributes == dependency->nattributes &&
>        strongest->degree >= dependency->degree)
>        continue;
>
>   That'll skip some additional dependencies, which seems OK.
>
> 3) It's not clear to me what you mean by
>
>     * TODO: Improve this code comment.  Specifically, why would we
>     * ignore that no rows will match?  It seems that such a discovery
>     * would allow us to return an estimate of 0 rows, and that would
>     * be useful.
>
>   added to dependencies_clauselist_selectivity. Are you saying we
>   should also compute selectivity estimates for individual clauses and
>   use Min() as a limit? Maybe, but that seems unrelated to the patch.

I mean that the comment right above that TODO is hard to understand. You
seem to be saying that it is good and proper to only take the
selectivity estimate from the final clause in the list, but then go on
to say that other clauses might prove that no rows will match. So that
implies that by ignoring all but the last clause, we're ignoring such
other clauses that prove no rows can match. But why would we be
ignoring those?

I am not arguing that your code is wrong. I'm just critiquing the
hard-to-understand phrasing of that code comment.

--
Mark Dilger

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-14 18:37:44 Re: SKIP_LOCKED test causes random buildfarm failures
Previous Message Andres Freund 2019-11-14 18:07:35 Re: Proposal: Add more compile-time asserts to expose inconsistencies.