Re: WIP: BRIN multi-range indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2020-12-20 00:16:05
Message-ID: 061b24c0-87eb-51a7-17d6-e1ba8c59fd23@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is an updated version of the patch series, rebased on current
master, and results for benchmark comparing the various bloom variants.

The improvements are fairly minor:

1) Rejecting bloom filters that are clearly too large (larger than page)
early. This is imperfect, as it works for individual index keys, not the
whole row. But per discussion it seems useful.

2) I've added sort_mode opclass parameter, allowing disabling the sorted
mode the bloom indexes start in by default. I'm not convinced we should
commit this, I've needed this for the benchmarking.

The benchmarking compares the three parts with different Bloom variants:

0004 - single hash with mod by (nbits) and (nbits-1)
0005 - two independent hashes (two random seeds)
0006 - partitioned approach, proposed by John Naylor

I'm attaching the shell script used to run the benchmark, and a summary
of the results. The 0004 is used as a baseline, and the comparisons show
speedups for 0005 and 0006 relative to that (if you scroll to the
right). Essentially, green means "faster than 0004" while red means slower.

I don't think any of those approaches comes as a clearly superior. The
results for most queries are within 2%, which is mostly just noise.
There are cases where the differences are more significant (~10%), but
it's in either direction and if you compare duration of the whole
benchmark (by summing per-query averages) it's within 1% again.

For the "mismatch" case (i.e. looking for values not contained in the
table) the differences are larger, but that's mostly due to luck and
hitting false positives for that particular query - on average the
differences are negligible, just like for the "match" case.

So based on this I'm tempted to just use the version with two hashes, as
implemented in 0005. It's much simpler than the partitioning scheme,
does not need any of the logic to generate primes etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0008-Ignore-correlation-for-new-BRIN-opclasses-20201220.patch text/x-patch 4.2 KB
0007-BRIN-minmax-multi-indexes-20201220.patch text/x-patch 217.3 KB
0006-use-one-hash-bloom-variant-20201220.patch text/x-patch 9.0 KB
0005-use-two-independent-hashes-20201220.patch text/x-patch 2.9 KB
0004-BRIN-bloom-indexes-20201220.patch text/x-patch 139.8 KB
0003-Optimize-allocations-in-bringetbitmap-20201220.patch text/x-patch 4.4 KB
0002-Move-IS-NOT-NULL-handling-from-BRIN-support-20201220.patch text/x-patch 23.3 KB
0001-Pass-all-scan-keys-to-BRIN-consistent-funct-20201220.patch text/x-patch 23.5 KB
results.ods application/vnd.oasis.opendocument.spreadsheet 78.8 KB
brin-bloom-test.sh application/x-shellscript 7.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-12-20 03:26:42 Re: pg_preadv() and pg_pwritev()
Previous Message Peter Geoghegan 2020-12-19 23:59:59 Re: Deleting older versions in unique indexes to avoid page splits