Re: BRIN cost estimate

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BRIN cost estimate
Date: 2017-04-06 08:01:01
Message-ID: CAE2gYzz4FhRV17090-mbFXx8XekPnCHk1qD71q5swC_hX6ePZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Good point. That's wrong, but I'm confused at why you kept the:
>
> + *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
>
> at all if that's the case. All the BRIN scan is going to do is build a
> bitmap of the matching ranges found.

My mind was not clear when I was working on it a year ago.

I've ended up with:
>
> + /*
> + * Charge a small amount per range tuple which we expect to match to. This
> + * is meant to reflect the costs of manipulating the bitmap. The BRIN scan
> + * will set a bit for each page in the range when we find a matching
> + * range, so we must multiply the charge by the number of pages in the
> + * range.
> + */
> + *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
> + pagesPerRange;

Isn't BRIN executes the operator only once per range? I think we can just
multiply cpu_operator_cost and estimatedRanges.

I also noticed that you were doing:
>
> + if (get_index_stats_hook &&
> + (*get_index_stats_hook) (root, index->indexoid, 1, &vardata))
>
> and
>
> + vardata.statsTuple = SearchSysCache3(STATRELATTINH,
> + ObjectIdGetDatum(index->indexoid),
> + Int16GetDatum(1),
>
> I wondered why you picked to hardcode that as 1, and I thought that's
> surely a bug. But then looking deeper it seems to be copied from
> btcostestimate(), which also seems to be a long-standing bug
> introduced in a536ed53bca40cb0d199824e358a86fcfd5db7f2. I'll go write
> a patch for that one now.

Yes, I copy-pasted from btcostestimate().

I do still have concerns about how the correlation value is used, and
> the fact that we use the highest value, but then the whole use of this
> value is far from perfect, and is just a rough indication of how
> things are.

I agree. I tried to document how incomplete it is on the comments I wrote
to help future developers improve the situation.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-04-06 08:02:14 Re: Interval for launching the table sync worker
Previous Message Kyotaro HORIGUCHI 2017-04-06 07:50:56 Re: [COMMITTERS] pgsql: Collect and use multi-column dependency stats