Re: BRIN cost estimate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BRIN cost estimate
Date: 2017-03-21 12:17:45
Message-ID: CAKJS1f9=-dv8wGHZ8KU1KnV1K0vFuy8JOcH_s774W61TJvbOfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 March 2017 at 23:50, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
>> 1.
>>
>> + Assert(nnumbers == 1);
>>
>> I think its a bad idea to Assert() this. The stat tuple can come from
>> a plugin which could do anything. Seems like if we need to be certain
>> of that then it should be an elog(ERROR), maybe mention that we
>> expected a 1 element array, but got <nnumbers> elements.
>
> But it is not coming from a plugin which can do anything. It is the
> plugin we know. Assert() is exact same coding with btcostestimate().

Not sure what you mean here. I'm not speaking of the brin index am, I
mean the get_index_stats_hook call which you've added. An extension
can be loaded which sets this hook and provides its own tuple, which
may cause the Assert to fail. Asserts are normally used to verify in
assert enabled builds that would cause some following code to crash or
not do what it's meant to. Normally they're used to help track down
bugs in the software, they're not meant to track bugs in some software
we have no control over.

>> 2.
>>
>> + Assert(varCorrelation >= 0 && varCorrelation <= 1);
>>
>> same goes for that. I don't think we want to Assert() that either.
>> elog(ERROR) seems better, or perhaps we should fall back on the old
>> estimation method in this case?
>
> Again the correlation value is expected to be in this range. I don't
> think we even need to bother with Assert(). Garbage in garbage out.

That's fine. Let's just not garbage crash.

>> The Asserted range also seems to contradict the range mentioned in
>> pg_statistic.h:
>
> We are using Abs() of the value.

I missed that.

>> 3.
>>
>> + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
>>
>> should numBlocks be named numRanges? After all we call the option
>> "pages_per_range".
>
> It was named this way before.

hmm, before what exactly? before your patch it didn't exist. You
introduced it into brincostestimate().

Here's the line of code:

+ double numBlocks;

If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.

>> 6.
>>
>> Might want to consider not applying this estimate when the statistics
>> don't contain any STATISTIC_KIND_CORRELATION array.
>
> I am not sure what would be better than using the value as 0 in this case.

At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.

> I can look into supporting expression indexes, if you think something
> very much incomplete like this has a chance to get committed.

I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.

I've not looked at the new patch yet, but thanks for making some of
those changes.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2017-03-21 12:21:50 Re: Problem in Parallel Bitmap Heap Scan?
Previous Message Michael Banck 2017-03-21 12:17:24 Re: Create replication slot in pg_basebackup if requested and not yet present