From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Emre Hasegeli <emre(at)hasegeli(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BRIN cost estimate |
Date: | 2017-04-02 12:41:21 |
Message-ID: | CAKJS1f8BiUSex=oqoDqAGJgRxKC2VGPc-mfX582=A1p7H4cs9g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 27 March 2017 at 00:44, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
> > If we want to have a variable which stores the number of ranges, then
> > I think numRanges is better than numBlocks. I can't imagine many
> > people would disagree there.
>
> I renamed it with other two.
>
> > At the very least please write a comment to explain this in the code.
> > Right now it looks broken. If I noticed this then one day in the
> > future someone else will. If you write a comment then person of the
> > future will likely read it, and then not raise any questions about the
> > otherwise questionable code.
>
> I added a sentence about it.
>
> > I do strongly agree that the estimates need improved here. I've
> > personally had issues with bad brin estimates before, and I'd like to
> > see it improved. I think the patch is not quite complete without it
> > also considering stats on expression indexes. If you have time to go
> > do that I'd suggest you go ahead with that.
>
> I copy-pasted expression index support from btcostestimate() as well,
> and extended the regression test.
>
> I think this function can use more polishing before committing, but I
> don't know where to begin. There are single functions for every
> access method in here. I don't like them being in there to begin
> with. selfuncs.s is quite long with all kinds of dependencies and
> dependents. I think it would be better to have the access method
> selectivity estimation functions under src/access. Maybe we should
> start doing so by moving this to src/access/brin/brin_selfuncs.c.
> What do you think?
>
Looking over this again.
- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);
Why did you change this? It seems unrelated.
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ rangeProportion = (double) pagesPerRange / baserel->pages;
+ numRanges = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);
Why do you add 1.0 to numRanges? This gives one too many ranges.
I also don't really like the way you're setting pagesPerRange. I'd suggest
keeping that as the raw value from the index metadata, and doing:
If you want the actual number of ranges then I think this is best expressed
as:
numRanges = Max(ceil(baserel->pages / pagesPerRange), 1.0);
The rangeProportion variable could be calculated based on that
too rangeProportion = 1.0 / numRanges;
That way you don't have to rely on relpages being > 0. It seems like a bad
assumption to make. I see there's some hack in estimate_rel_size() making
that true, but I think it's best not to rely on that hack.
- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
*indexPages = index->pages;
- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs. Unlike other indexes, we are not processing
+ * tuples but ranges.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numRanges * qual_op_cost;
What's the reason for removing the + list_length(indexOrderBys) here? I
don't think it's the business of this patch to touch that. BRIN may one day
support that by partition sorting non-overlapping ranges, so that could
well be why it was there in the first place.
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
Why is this not being charged qual_op_cost anymore?
+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree. We are using a product of logical and
Can you explain why this is the case?
+ * class "minmax", and makes a little sense for the other one "inclusion".
"and" I think should be "but"
I think it would also be good to write some regression tests for this. All
the changes I see that you did make to brin.sql seem unrelated to this
patch.
I've also attached a spreadsheet that can be used to play around with the
estimates your patch is giving.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
BRIN_estimates2.ods | application/vnd.oasis.opendocument.spreadsheet | 18.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2017-04-02 12:53:08 | Re: Variable substitution in psql backtick expansion |
Previous Message | David Rowley | 2017-04-02 11:23:29 | Re: Performance improvement for joins where outer side is unique |