Re: WIP: BRIN multi-range indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2021-01-12 17:42:10
Message-ID: 8a40b7e4-357e-b0c2-e8b7-72001993fc32@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/12/21 6:28 PM, John Naylor wrote:
> On Sat, Dec 19, 2020 at 8:15 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> wrote:
> > [12-20 version]
>
> Hi Tomas,
>
> The measurements look good. In case it fell through the cracks, my
> earlier review comments for Bloom BRIN indexes regarding minor details
> don't seem to have been addressed in this version. I'll point to earlier
> discussion for convenience:
>
> https://www.postgresql.org/message-id/CACPNZCt%3Dx-fOL0CUJbjR3BFXKgcd9HMPaRUVY9cwRe58hmd8Xg%40mail.gmail.com
> <https://www.postgresql.org/message-id/CACPNZCt%3Dx-fOL0CUJbjR3BFXKgcd9HMPaRUVY9cwRe58hmd8Xg%40mail.gmail.com>
>
> https://www.postgresql.org/message-id/CACPNZCuqpkCGt8%3DcywAk1kPu0OoV_TjPXeV-J639ABQWyViyug%40mail.gmail.com
> <https://www.postgresql.org/message-id/CACPNZCuqpkCGt8%3DcywAk1kPu0OoV_TjPXeV-J639ABQWyViyug%40mail.gmail.com>
>

Whooops :-( I'll go through those again, thanks for reminding me.

> > The improvements are fairly minor:
> >
> > 1) Rejecting bloom filters that are clearly too large (larger than page)
> > early. This is imperfect, as it works for individual index keys, not the
> > whole row. But per discussion it seems useful.
>
> I think this is good enough.
>
> > So based on this I'm tempted to just use the version with two hashes, as
> > implemented in 0005. It's much simpler than the partitioning scheme,
> > does not need any of the logic to generate primes etc.
>
> Sounds like the best engineering decision.
>
> Circling back to multi-minmax build times, I ran a couple quick tests on
> bigger hardware, and found that not only is multi-minmax slower than
> minmax, which is to be expected, but also slower than btree. (unlogged
> table ~12GB in size, maintenance_work_mem = 1GB, median of three runs)
>
> btree          38.3s
> minmax         26.2s
> multi-minmax  110s
>
> Since btree indexes are much larger, I imagine something algorithmic is
> involved. Is it worth digging further to see if some code path is taking
> more time than we would expect?
>

I suspect it'd due to minmax having to decide which "ranges" to merge,
which requires repeated sorting, etc. I certainly don't dare to claim
the current algorithm is perfect. I wouldn't have expected such big
difference, though - so definitely worth investigating.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-01-12 17:53:56 Re: pg_upgrade test for binary compatibility of core data types
Previous Message Tom Lane 2021-01-12 17:40:42 pg_dump PublicationRelInfo objects need to be marked with owner