Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Yuto Hayamizu <y(dot)hayamizu(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Date: 2018-07-26 06:12:04
Message-ID: CAFjFpRdb6CAUOWQVMEPbF8YeDFPcG0VNVj2GevyQ_d=tQNfBTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 29, 2018 at 10:42 AM, Yuto Hayamizu <y(dot)hayamizu(at)gmail(dot)com> wrote:
> On Fri, Jan 19, 2018 at 5:07 PM, Yuto Hayamizu <y(dot)hayamizu(at)gmail(dot)com> wrote:
>> My idea of improving this patch is that give a threshold N_limit,
>> and for q_1 ... q_N_limit, do the same weighted cost estimation in the
>> current version of this patch.
>> For q_{N_limit+1} ...., stop calling clauselist_selectivity for
>> calculating the weight
>> and reuse the result of clauselist_selectivity({q_1,q_2, ..., q_N_limit}).
>> For example, if N_limit=100, additional overhead is only
>> sub-milliseconds per each range table entry,
>> and cost estimation is surely better than the current postgres implementation.
>
> Attached patch implemented the improvement idea above.
> With this patch attached, performance degradation of the test query
> with many quals was <1%.
> Example test query is attached.
>

I looked at the patch and the idea sounds reasonable to me. But I
still have doubts about the usefulness of the patch. What this patch
does is to produce more accurate costs of scan of a single relation.
That's good, but it does that for all the paths created for that
relation. So the accurate estimate doesn't help us to choose one
method of scanning the relation over the other method. It also does
not affect the costs of different join paths, sort paths etc. IOW, I
can not imagine a find a query which will be planned in a different
way than upstream when we apply the patch and the new plan will be
more efficient than the previous one. I might be missing something
but. Can you please provide such a query.

In the patch clauselist_selectivity() gets called repeatedly for every
new qual added to the clause list. Instead, if we try to combine the
cost/row estimation with order_qual_clauses() or
clauselist_selectivity(), we might be able to what the current patch
does in O(n). clauselist_selectivity() calculates combined selectivity
of all the given quals in O(n). We should do something similar to that
in this case.

As suggested upthread by Tom, it will be more useful to have this
feature work with not just scans but joins as well.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-07-26 07:06:45 Re: Making "COPY partitioned_table FROM" faster
Previous Message Tsunakawa, Takayuki 2018-07-26 05:51:31 RE: [bug fix] Produce a crash dump before main() on Windows