Re: BRIN index which is much faster never chosen by planner

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>, Jeremy Finzel <finzelj(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: BRIN index which is much faster never chosen by planner
Date: 2019-10-15 22:40:47
Message-ID: 20191015224047.GV3599@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This reminds me of an issue I reported several years ago where Btree index
scans were chosen over seq scan of a large, INSERT-only table due to very high
correlation, but performed poorly. I concluded that use of the the high "large
scale" correlation on a large 50+GB table caused the planner to fail to account
for a larger number of pages being read nonsequentially (the opposite of your
issue). I think that's because we were INSERTing data which was at least
approximately sorted on record END time, and the index was on record START
time. For a large table with a week's data, the correlation of "start time"
was still be very high (0.99995). But scanning the index ends up reading pages
nonsequentially, and also multiple visits per page.

I eeked out a patch which made "correlation" a per-index statistic rather than
a per-column one. That means the planner could distinguish between a
freshly-built btree index and a fragmented one. (At the time, there was a
hypothesis that our issue was partially due to repeated values of the index
columns.) It didn't occur to me at the time, but that would also allow
creating numerous, partial BRIN indices, each of which would have separate
correlation computed over just their "restricted range", which *might* also
handle your case, depending on how packed your data is.

https://www.postgresql.org/message-id/flat/20170707234119.GN17566%40telsasoft.com#fdcbebc342b8fb9ad0ff293913f54d11

On Tue, Oct 15, 2019 at 11:05:13AM -0500, Jeremy Finzel wrote:
> I do believe that the only use case that will work really well for BRIN is
> either a truly insert-only table which is never pruned ... or a table which
> is routinely CLUSTERed!

Or partitioned table, which for large data sets I highly recommend instead of
DELETE.

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-10-15 22:43:52 Re: BRIN index which is much faster never chosen by planner
Previous Message Justin Pryzby 2019-10-15 22:26:59 Re: v12.0 ERROR: trying to store a heap tuple into wrong type of slot