Re: Bloom Indexes - bit array length and the total number of bits (or hash functions ?? ) !

From: Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bloom Indexes - bit array length and the total number of bits (or hash functions ?? ) !
Date: 2019-06-09 16:54:05
Message-ID: CAN0Tujf-aLrAtr0Hf6jXGeQucRZG56+EtgYZ4zJSGiByss3XJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Fabien,

But the 2 direct questions i have are :

1. What is the structure of the Bloom Index ? Can you please let me know
what are the fields of a Bloom Index ? Is it just the Item Pointer
and BloomSignatureWord ?
When i describe my bloom index it looks like following.

postgres=# \d+ foo.idx_bloom_bar
Index "foo.idx_bloom_bar"
Column | Type | Key? | Definition | Storage | Stats target
---------+---------+------+------------+---------+--------------
id | integer | yes | id | plain |
dept | integer | yes | dept | plain |
id2 | integer | yes | id2 | plain |
id3 | integer | yes | id3 | plain |
id4 | integer | yes | id4 | plain |
id5 | integer | yes | id5 | plain |
id6 | integer | yes | id6 | plain |
zipcode | integer | yes | zipcode | plain |
bloom, for table "foo.bar"
Options: length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4,
col8=4

2. If we are storing just one signature word per row, how is this
aggregated for all column values of that row into one signature in high
level ?
For example, if length = 64, does it mean that a bit array of 64 bits is
generated per each row ?
If col1=4, does it mean the value of col1 is passed to 4 hash functions
that generate 4 positions that can be set to 1 in the bit array ?

On Sat, Jun 8, 2019 at 3:11 AM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

>
> Hello Avinash,
>
> > I was testing bloom indexes today. I understand bloom indexes uses bloom
> > filters. [...]
> >
> > So the question here is -
> > I assume - number of bits = k. Where k is the total number of hash
> > functions used on top of the data that needs to validated. Is that
> correct
> > ? If yes, why do we see the Index 1 performing better than Index 2 ?
> > Because, the data has to go through more hash functions (4 vs 2) in
> Index 1
> > than Index 2. So, with Index 1 it should take more time.
> > Also, both the indexes have ZERO false positives.
> > Please let me know if there is anything simple that i am missing here.
>
> You may have a look at the blog entry about these parameters I redacted a
> few year ago:
>
> http://blog.coelho.net/database/2016/12/11/postgresql-bloom-index.html
>
> --
> Fabien.
>
>
>

--
9000799060

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Siarhei Siniak 2019-06-09 18:05:20 GiST limits on contrib/cube with dimension > 100?
Previous Message Andres Freund 2019-06-09 15:40:00 Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY