Re: Use extended statistics to estimate (Var op Var) clauses

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use extended statistics to estimate (Var op Var) clauses
Date: 2021-08-28 17:18:06
Message-ID: CALNJ-vReAWzWg7y2b6qK9Yu2fw6ba=tCRZnQmCB-E5C3O7YbBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 28, 2021 at 9:30 AM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
wrote:

>
>
> > On Aug 28, 2021, at 6:52 AM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
> wrote:
> >
> > Part 0003 fixes handling of those clauses so that we don't treat them as
> simple, but it does that by tweaking statext_is_compatible_clause(), as
> suggested by Dean.
>
> Function examine_opclause_args() doesn't set issimple to anything in the
> IsA(rightop, Const) case, but assigns *issimplep = issimple at the bottom.
> The compiler is not complaining about using a possibly uninitialized
> variable, but if I change the "return true" on the very next line to
> "return issimple", the compiler complains quite loudly.
>
>
> Some functions define bool *issimple, others bool *issimplep and bool
> issimple. You might want to standardize the naming.
>
> It's difficult to know what "simple" means in extended_stats.c. There is
> no file-global comment explaining the concept, and functions like
> compare_scalars_simple don't have correlates named compare_scalars_complex
> or such, so the reader cannot infer by comparison what the difference might
> be between a "simple" case and some non-"simple" case. The functions'
> issimple (or issimplep) argument are undocumented.
>
> There is a comment:
>
> /*
> * statext_mcv_clauselist_selectivity
> * Estimate clauses using the best multi-column statistics.
> ....
> *
> * - simple selectivity: Computed without extended statistics, i.e. as if
> the
> * columns/clauses were independent.
> *
> ....
> */
>
> but it takes a while to find if you search for "issimple".
>
>
> In both scalarineqsel_wrapper() and eqsel_internal(), the call to
> matching_restriction_variables() should usually return false, since
> comparing a variable to itself is an unusual case. The next call is to
> get_restriction_variable(), which repeats the work of examining the left
> and right variables. So in almost all cases, after throwing away the
> results of:
>
> examine_variable(root, left, varRelid, &ldata);
> examine_variable(root, right, varRelid, &rdata);
>
> performed in matching_restriction_variables(), we'll do exactly the same
> work again (with one variable named differently) in
> get_restriction_variable():
>
> examine_variable(root, left, varRelid, vardata);
> examine_variable(root, right, varRelid, &rdata);
>
> That'd be fine if example_variable() were a cheap function, but it appears
> not to be. Do you think you could save the results rather than recomputing
> them? It's a little messy, since these are the only two functions out of
> about ten which follow this pattern, so you'd have to pass NULLs into
> get_restriction_variable() from the other eight callers, but it still looks
> like that would be a win.
>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> Hi,
I wonder if the queries with (a=a) or (a<a) clauses are worth this
additional complexity to address.
Has anyone seen such clause in production queries ?

I would think the randomly generated queries should be narrowed a bit to
exclude such cases where the result of the clause is known regardless of
the underlying data.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-08-28 17:40:08 Re: Use extended statistics to estimate (Var op Var) clauses
Previous Message Mark Dilger 2021-08-28 16:30:12 Re: Use extended statistics to estimate (Var op Var) clauses