Quick Links

Re: BRIN index which is much faster never chosen by planner

From:	Jeremy Finzel <finzelj(at)gmail(dot)com>
To:	Michael Lewis <mlewis(at)entrata(dot)com>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: BRIN index which is much faster never chosen by planner
Date:	2019-10-11 14:19:33
Message-ID:	CAMa1XUgnHL9760p9ZqSQNG4u=WSbnedgzx453e_QfpXFYDDY3g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dear Michael,

On Thu, Oct 10, 2019 at 5:20 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> Since the optimizer is choosing a seq scan over index scan when it seems
> like it has good row estimates in both cases, to me that may mean costs of
> scanning index are expected to be high. Is this workload on SSD? Has the
> random_page_cost config been decreased from default 4 (compared with cost
> of 1 unit for sequential scan)?
>

It's 1.5

> Your buffer hits aren't great. What is shared_buffers set to? How much ram
> on this cluster?
>

shared_buffers is 4GB. It has 500G of RAM, but server has several clusters
on it.

>
> With this table being insert only, one assumes correlation is very high on
> the data in this column as shown in pg_stats, but have your confirmed?
>

Yes, but the issue isn't with the BRIN index performing badly or being
fragmented. It's that it performs great (7x faster than the seq scan) but
postgres doesn't pick using it. I have seen this same issue also in other
attempts I have made to use BRIN.

> To me, distinct ON is often a bad code smell and probably can be
> re-written to be much more efficient with GROUP BY, lateral & order by, or
> some other tool. Same with the window function. It is a powerful tool, but
> sometimes not the right one.
>

I don't really agree, but it's beside the point because the issue is not in
aggregation. It's pre-aggregation. Indeed if I run my query as a simple
select (as I tried) it's the exact same planning issue. (In my experience,
distinct on for given example is the fastest. Same with window functions
which prevent inefficient self-joins)

> Is "source" a function that is called on field1? What is it doing/how is
> it defined?
>

I can't see how that matters either, but the "source" function is a mask
for a built-in pg function that is trivial. This whole query is masked so
as not to expose our actual prod query, but I hope it's still
understandable enough :).

My question is not how to make this query faster in general. It's that I
want to use BRIN indexes very much, but I'm not sure I can trust they will
scale with the right query plan like I know BTREE will.

Thanks!
Jeremy

In response to

Re: BRIN index which is much faster never chosen by planner at 2019-10-10 22:19:36 from Michael Lewis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeremy Finzel	2019-10-11 14:31:08	Re: BRIN index which is much faster never chosen by planner
Previous Message	Tom Lane	2019-10-11 14:08:37	Re: Issues with PAM : log that it failed, whether it actually failed or not