Re: maximum columns for brin bloom indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: maximum columns for brin bloom indexes
Date: 2021-04-08 14:57:59
Message-ID: 8cb8898a-d72d-72e4-eb90-799bca712246@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/8/21 4:49 PM, Jaime Casanova wrote:
> On Thu, Apr 08, 2021 at 12:18:36PM +0200, Tomas Vondra wrote:
>> On 4/8/21 9:08 AM, Jaime Casanova wrote:
>>> Hi everyone,
>>>
>>> When testing brin bloom indexes I noted that we need to reduce the
>>> PAGES_PER_RANGE parameter of the index to allow more columns on it.
>>>
>>> Sadly, this could be a problem if you create the index before the table
>>> grows, once it reaches some number of rows (i see the error as early as
>>> 1000 rows) it starts error out.
>>>
>>> create table t1(i int, j int);
>>>
>>> -- uses default PAGES_PER_RANGE=128
>>> create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
>>>
>>> insert into t1
>>> select random()*1000, random()*1000 from generate_series(1, 1000);
>>> ERROR: index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx"
>>>
>>> if instead you create the index with a minor PAGES_PER_RANGE it goes
>>> fine, in this case it works once you reduce it to at least 116
>>>
>>> create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops )
>>> with (pages_per_range=116);
>>>
>>>
>>> so, for having:
>>> two int columns - PAGES_PER_RANGE should be max 116
>>> three int columns - PAGES_PER_RANGE should be max 77
>>> one int and one timestamp - PAGES_PER_RANGE should be max 121
>>>
>>> and so on
>>>
>>
>> No, because this very much depends on the number if distinct values in
>> the page page range, which determines how well the bloom filter
>> compresses. You used 1000, but that's just an arbitrary value and the
>> actual data might have any other value. And it's unlikely that all three
>> columns will have the same number of distinct values.
>>
>
> Ok, that makes sense. Still I see a few odd things:
>
> """
> drop table if exists t1;
> create table t1(i int, j int);
> create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
>
> -- This one will succeed, I guess because it has less different
> -- values
> insert into t1
> select random()*20, random()*100 from generate_series(1, 1000);
>
> -- succeed
> insert into t1
> select random()*20, random()*100 from generate_series(1, 100000);
>
> -- succeed
> insert into t1
> select random()*200, random()*1000 from generate_series(1, 1000);
>
> -- succeed
> insert into t1
> select random()*200, random()*1000 from generate_series(1, 1000);
>
> -- succeed? This is the case it has been causing problems before
> insert into t1
> select random()*1000, random()*1000 from generate_series(1, 1000);
> """
>
> Maybe this makes sense, but it looks random to me. If it makes sense
> this is something we should document better.
>

Presumably it's about where exactly are the new rows added, and when we
summarize the page range.

> Let's try another combination:
>
> """
> drop table if exists t1;
> create table t1(i int, j int);
> create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
>
> -- this fails again
> insert into t1
> select random()*1000, random()*1000 from generate_series(1, 1000);
>
> -- and this starts to fail now, but this worked before
> insert into t1
> select random()*20, random()*100 from generate_series(1, 1000);
> """
>
>> Of course, this also depends on the false positive rate.
>>
>
> How the false positive rate work?
>

The lower the false positive rate, the more "accurate" the index is,
because fewer page ranges not containing the value will be added to the
bitmap. The bloom filter however has to be larger.

>> FWIW I doubt people are using multi-column BRIN indexes very often.
>>
>
> true.
>
> Another question, should we allow to create a brin multi column index
> that uses different opclasses?
>
> CREATE INDEX ON t1 USING brin (i int4_bloom_ops, j int4_minmax_ops);
>

Why not? Without that you couldn't create index on (int, bigint) because
those are in principle different opclasses too. I don't see what would
this restriction give us.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2021-04-08 15:04:28 Re: Performing partition pruning using row value
Previous Message David Steele 2021-04-08 14:55:17 Re: [BUG] orphaned function