Re: TODO list updates

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TODO list updates
Date: 2015-10-27 06:59:00
Message-ID: E8B8AAF7-EB42-46A8-876C-31F34FF33C4E@aalto.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 October 2015 18:20:59 EEST, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>I think on-disk bitmap indexes would only beat GIN indexes in a
>read-only database on low-cardinality columns. For example, if you had
>a purchase_log table and wanted to know all the "blue" and "large"
>items
>sold at a specific store, I can see on-disk bitmap indexes doing well
>there. If you added the product number, or needed read/write, I think
>GIN would win. I just don't think we have enough deployments who need
>what on-disk bitmap are best at.

My take on this is that we effectively already have bitmap indexes: it's called GIN. We could make the posting list compression even better, currently a TID is compressed at best to a single byte, while in a bitmap index it could go down to one bit, or even less. But that's just a matter of improving the compression algorithm, making it more bitmapy, and could be done as a fairly isolated change in GIN code.

Besides being more dense, there are some other tricks often associated with bitmap indexes. Instead of storing a bitmap/posting list per each unique value, you could store one for a range of values. That's useful e.g. for storing floats, where you don't have many exact duplicate values, but you could get a dense index by treating all values in range 0.0-10.0 as one entry, all values in 10.0-50.0 as another, and so forth. Yet another trick is to have one bitmap for all values > 0.0, another for all values >10.0, and so forth. With that, you can satisfy any BETWEEN query by scanning just two bitmaps/posting lists: the one for the lower bound and the one for the upper bound. The matching tuples are the ones that are present in the first, but not the latter posting list. But that's also not a whole new index type. GIN could do all that, if someone just wrote an opclass for it.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-10-27 07:20:24 Re: Re: [BUGS] BUG #13611: test_postmaster_connection failed (Windows, listen_addresses = '0.0.0.0' or '::')
Previous Message Kouhei Kaigai 2015-10-27 05:46:15 Re: [DESIGN] ParallelAppend