Quick Links

Re: WIP: BRIN multi-range indexes

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: WIP: BRIN multi-range indexes
Date:	2020-12-20 00:16:05
Message-ID:	061b24c0-87eb-51a7-17d6-e1ba8c59fd23@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Attached is an updated version of the patch series, rebased on current
master, and results for benchmark comparing the various bloom variants.

The improvements are fairly minor:

1) Rejecting bloom filters that are clearly too large (larger than page)
early. This is imperfect, as it works for individual index keys, not the
whole row. But per discussion it seems useful.

2) I've added sort_mode opclass parameter, allowing disabling the sorted
mode the bloom indexes start in by default. I'm not convinced we should
commit this, I've needed this for the benchmarking.

The benchmarking compares the three parts with different Bloom variants:

0004 - single hash with mod by (nbits) and (nbits-1)
0005 - two independent hashes (two random seeds)
0006 - partitioned approach, proposed by John Naylor

I'm attaching the shell script used to run the benchmark, and a summary
of the results. The 0004 is used as a baseline, and the comparisons show
speedups for 0005 and 0006 relative to that (if you scroll to the
right). Essentially, green means "faster than 0004" while red means slower.

I don't think any of those approaches comes as a clearly superior. The
results for most queries are within 2%, which is mostly just noise.
There are cases where the differences are more significant (~10%), but
it's in either direction and if you compare duration of the whole
benchmark (by summing per-query averages) it's within 1% again.

For the "mismatch" case (i.e. looking for values not contained in the
table) the differences are larger, but that's mostly due to luck and
hitting false positives for that particular query - on average the
differences are negligible, just like for the "match" case.

So based on this I'm tempted to just use the version with two hashes, as
implemented in 0005. It's much simpler than the partitioning scheme,
does not need any of the logic to generate primes etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
0008-Ignore-correlation-for-new-BRIN-opclasses-20201220.patch	text/x-patch	4.2 KB
0007-BRIN-minmax-multi-indexes-20201220.patch	text/x-patch	217.3 KB
0006-use-one-hash-bloom-variant-20201220.patch	text/x-patch	9.0 KB
0005-use-two-independent-hashes-20201220.patch	text/x-patch	2.9 KB
0004-BRIN-bloom-indexes-20201220.patch	text/x-patch	139.8 KB
0003-Optimize-allocations-in-bringetbitmap-20201220.patch	text/x-patch	4.4 KB
0002-Move-IS-NOT-NULL-handling-from-BRIN-support-20201220.patch	text/x-patch	23.3 KB
0001-Pass-all-scan-keys-to-BRIN-consistent-funct-20201220.patch	text/x-patch	23.5 KB
results.ods	application/vnd.oasis.opendocument.spreadsheet	78.8 KB
brin-bloom-test.sh	application/x-shellscript	7.7 KB

In response to

Re: WIP: BRIN multi-range indexes at 2020-11-09 19:20:31 from John Naylor

Responses

Re: WIP: BRIN multi-range indexes at 2021-01-02 06:42:53 from Thomas Munro
Re: WIP: BRIN multi-range indexes at 2021-01-12 17:28:52 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2020-12-20 03:26:42	Re: pg_preadv() and pg_pwritev()
Previous Message	Peter Geoghegan	2020-12-19 23:59:59	Re: Deleting older versions in unique indexes to avoid page splits