Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Yuto Hayamizu <y(dot)hayamizu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Date: 2018-07-26 17:00:41
Message-ID: CAEZATCW4=er-KzHw4J8sPix45VoZ09vnKTwFXz25dfFz29ozzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26 July 2018 at 07:12, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> In the patch clauselist_selectivity() gets called repeatedly for every
> new qual added to the clause list. Instead, if we try to combine the
> cost/row estimation with order_qual_clauses() or
> clauselist_selectivity(), we might be able to what the current patch
> does in O(n). clauselist_selectivity() calculates combined selectivity
> of all the given quals in O(n). We should do something similar to that
> in this case.

Duplicating the logic of clauselist_selectivity() seems like quite a
lot of effort to go to just for improved filter cost estimates. Note
also that clauselist_selectivity() might get a good deal more complex
with multivariate stats.

Perhaps a reasonable simplification would be to just treat the clauses
as independent when estimating the filter cost, and so use the
following as a "good enough" estimate for the filter cost:

cost(A) + sel(A) * cost(B) + sel(A) * sel(B) * cost(C) + ...

That would probably be an improvement over what we currently have. It
would be O(n) to compute, and it would probably use the cached
selectivity estimates for the clauses.

Note also that with this simplified expression for the filter cost, it
would be possible to improve order_qual_clauses() to take into account
the clause selectivities when sorting the clauses, and minimise the
cost of the filter by evaluating more selective clauses sooner, if
they're not too expensive. I believe that amounts to sorting by
cost/(1-sel) rather than just cost, except for clauses with sel==1,
which it makes sense to move to the end, and just sort by cost.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-07-26 17:03:39 Re: Online enabling of checksums
Previous Message Tom Lane 2018-07-26 16:47:46 Re: Speeding up INSERTs and UPDATEs to partitioned tables