Special bloom index of INT, BIGINT, BIT, VARBIT for bitwise operation

From: Takao Magoori <mago(at)magomago(dot)jp>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Special bloom index of INT, BIGINT, BIT, VARBIT for bitwise operation
Date: 2018-07-11 06:02:59
Message-ID: CAEgEiV51AXRZ=PnDxUR-EvJjk=7bT70K-5RQASK+r8uxAP6NnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I have a table which has billions of rows and I want to select it by
bitwise operation like,

=# CREATE TABLE IF NOT EXISTS t_bitwise (
id INTEGER NOT NULL
,status_bigint BITINT NOT NULL
,status_bit BIT(32) NOT NULL
);

=# INSERT INTO t_bitwise (id, status_bigint, status_bit) SELECT
id
,(random() * 4294967295)::BIGINT
,(random() * 4294967295)::BIGINT::BIT(32)
FROM generate_series(1, 3000000000) as t(id);

=# SELECT * FROM t_bitwise WHERE
status_bigint & 170 = 170
OR status_bigint & 256 = 256;

=# SELECT * FROM t_bitwise WHERE
status_bit & b'00000000000000000000000010101010'::BIT(32) =
b'00000000000000000000000010101010'::BIT(32)
OR status_bit & b'00000000000000000000000100000000'::BIT(32) =
b'00000000000000000000000100000000'::BIT(32);

Yes, these SELECT statements scan all rows. I guess possible index types are

- Expression indexes ?
- Partial Indexes ?
- GIN ?
- GIST ?
- bloom index ?

I googled but I feel there is no good solution and it would be good if
I hava "bloom index specific for bitwise operation".

In case of normal bloom index, a value is hashed into a few bits which
is mapped to a signature (default 80 bits).
This is a lossy representation of the original value, and as such is
prone to reporting false positives which requires "Recheck" process at
SELECT. The more rows or the more complex condition, the more
execution time.

My idea is that, in case of index for bitwise operation, each bit
should be mapped to exactly same bit on a signature (One to one
mapping). No false positives. No "Recheck" process is required. If the
target coulmn is BIT(32), just 32 bits signature lengh is enough.

Is there any index module like this ? Since I am not familiar with C
and Postgresql, I can not write my own module.

Any help would be great for me.

Thanks,
Takao

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message 2018-07-11 14:16:09 High concurrency but simple updating causes deadlock
Previous Message Justin Pryzby 2018-07-10 18:38:28 Re: performance statistics monitoring without spamming logs