Re: bloom filter indexes?

From: "Mason Hale" <masonhale(at)gmail(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: bloom filter indexes?
Date: 2008-06-03 18:06:15
Message-ID: 8bca3aa10806031106j67af071bse6397f783fd6991f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jun 3, 2008 at 12:04 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2008-06-03 at 09:43 -0500, Mason Hale wrote:
>> I've been working on partitioning a rather large dataset into multiple
>> tables. One limitation I've run into the lack of cross-partition-table
>> unique indexes. In my case I need to guarantee the uniqueness of a
>> two-column pair across all partitions -- and this value is not used to
>> partition the tables. The table is partitioned based on a insert date
>> timestamp.
>
> You're looking for a constraint across tables.
>

Yes, for this particular case. But I'm also interested in speeding up
cross-partition queries whether it is for a uniqueness check or not.
This uniqueness check is just one (important) instance of a
cross-partition query.

>> To check the uniqueness of this value I've added an insert/update
>> trigger to search for matches in the other partitions. This trigger is
>> adding significant overhead to inserts and updates.
>
> Do you lock all of the tables before doing the check? If not, then you
> have a race condition.
>

Yes, I was concerned about that.

> It's possible this index strategy will be better for your case.
> However, I think what you really want is some kind of multi-table
> primary key. Have you considered storing the key in its own two-column
> table with a UNIQUE index and having the partitions reference it?

Thanks for the suggestion -- I'll explore maintaining the compound key
in its own non-partitioned table. I was trying to avoid any
application-layer code changes. I guess I can still accomplish that by
updating this table via an insert/update trigger.

But to reiterate, having bloom filter-based index would allow constant
time determination of whether a given partition *may* contain the
data. This would be very useful for large partitioned data-sets,
especially in (very common) cases where performance is critical.

This feature would also be useful for applications where data is
partitioned (aka 'federated') across multiple servers.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2008-06-03 18:12:25 Re: bloom filter indexes?
Previous Message Jeff Davis 2008-06-03 17:04:18 Re: bloom filter indexes?