# Re: BRIN cost estimate

From: David Rowley Emre Hasegeli Alvaro Herrera , PostgreSQL Hackers Re: BRIN cost estimate 2017-04-05 04:33:28 CAKJS1f9n-Wapop5Xz1dtGdpdqmzeGqQK4sV2MK-zZugfC14Xtw@mail.gmail.com Raw Message | Whole Thread | Download mbox 2015-11-16 13:52:39 from Alvaro Herrera 📎  2015-12-24 16:39:01 from Emre Hasegeli   2015-12-24 21:17:45 from Emre Hasegeli 📎    2017-03-01 10:45:51 from Alvaro Herrera     2017-03-01 15:40:49 from Alvaro Herrera 📎      2017-03-13 01:19:05 from David Rowley       2017-03-13 08:58:17 from Emre Hasegeli       2017-03-17 10:50:26 from Emre Hasegeli 📎        2017-03-21 12:17:45 from David Rowley         2017-03-21 12:39:22 from Emre Hasegeli         2017-03-26 11:44:42 from Emre Hasegeli 📎          2017-03-31 14:37:25 from David Steele          2017-04-02 12:41:21 from David Rowley 📎           2017-04-02 15:05:09 from Emre Hasegeli            2017-04-05 04:33:28 from David Rowley 📎             2017-04-05 05:34:29 from Emre Hasegeli              2017-04-06 05:40:45 from David Rowley 📎               2017-04-06 08:01:01 from Emre Hasegeli                2017-04-06 12:13:58 from David Rowley                 2017-04-06 15:47:55 from Tom Lane                  2017-04-06 21:47:48 from Alvaro Herrera pgsql-hackers

On 3 April 2017 at 03:05, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
> Unfortunately, I am on vacation for two weeks without my computer. I can
> post another version after 18th. I know we are under time pressure for
> release. I wouldn't mind if you or Alvaro would change it anyway you like.

I've made some changes. Actually, I completely changed how the
estimates work. I find this method more self-explanatory.

Basically, we work out the total index ranges, then work out how many
of those we'd touch in a perfectly correlated scenario. We then work
out how many ranges we'll probably visit based on the correlation
estimates from the stats, and assume the selectivity is probableRanges
/ totalIndexRanges.

I've attached a spreadsheet that compares Emre's method to mine. Mine
seems to favour the BRIN index less when the table is small. I think
this is pretty natural since if there is only 1 range, and we narrow
the result to one of them, then we might as well have performed a
seqscan.

My method seems favour BRIN a bit longer when the correlation is
between about 1% and 100%. But penalises BRIN much more when
correlation is less than around 1%. This may be better my way is
certainly smarter than the unpatched version, but still holds on a bit
longer, which may be more favourable if a BRIN index actually exists.
It might be more annoying for a user to have added a BRIN index and
never have the planner choose it.

My method also never suffers from estimating > 100% of the table.

I was a bit worried that Emre's method would penalise BRIN too much
when the correlation is not so high.

Interested to hear comments on this.

Please feel free to play with the spreadsheet by changing rows 1-3 in column B.

--
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
brin-correlation-drowley_v1.patch application/octet-stream 9.1 KB