Re: Minmax indexes

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-06-15 02:34:04
Message-ID: 20140615023404.GY18688@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Sep 25, 2013 at 4:34 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
> > Here's an updated version of this patch, with fixes to all the bugs
> > reported so far. Thanks to Thom Brown, Jaime Casanova, Erik Rijkers and
> > Amit Kapila for the reports.
>
> I'm not very happy with the use of a separate relation fork for
> storing this data.

Here's a new version of this patch. Now the revmap is not stored in a
separate fork, but together with all the regular data, as explained
elsewhere in the thread.

I added a few pageinspect functions that let one explore the data in the
index. With this you can start by reading the metapage, and from there
obtain the block numbers for the revmap array pages; and explore revmap
array pages to read regular revmap pages, which contain the TIDs to
index entries. All these pageinspect functions don't currently have any
documentation, but it's as easy as

with idxname as (select 'ti'::text as idxname)
select *
from idxname,
generate_series(0, pg_relation_size(idxname) / 8192 - 1) i,
minmax_page_type(get_raw_page(idxname, i::int));

select * -- data in metapage
from
minmax_metapage_info(get_raw_page('ti', 0));

select * -- data in revmap array pages
from minmax_revmap_array_data(get_raw_page('ti', 6));

select logblk, unnest(pages) -- data in regular revmap pages
from minmax_revmap_data(get_raw_page('ti', 15));

select * -- data in regular index pages
from minmax_page_items(get_raw_page('ti', 2), 'ti'::regclass);

Note that in this last case you need to give it the OID of the index as
the second parameter, so that it can construct a tupledesc for decoding
the min/max data.

I have followed the suggestion by Amit to overwrite the index tuple when
a new heap tuple is inserted, instead of creating a separate index
tuple. This saves a lot of index bloat. This required a new entry
point in bufpage.c, PageOverwriteItemData(). bufpage.c also has a new
function PageIndexDeleteNoCompact which is similar in spirit to
PageIndexMultiDelete except that item pointers do not change. This is
necessary because the revmap stores item pointers, and such reference
would break if we were to renumber items in index pages.

I have also added a reloption for the size of each page range, so you
can do
create index ti on t using minmax (a) with (pages_per_range = 2);
The default is 128 pages per range, and I have an arbitrary maximum of
131072 (default size of a 1GB segment). There doesn't seem to be much
point in having larger page ranges; intuitively I think page ranges
should be more or less the size of kernel readahead, but I haven't
tested this.

I didn't want to rebase past 0ef0b6784 in a hurry. I only know this
applies cleanly on top of fe7337f2dc, so please use that if you want to
play with it. I will post a rebased version shortly.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
minmax-8.patch text/x-diff 165.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-06-15 04:17:12 Re: Add CREATE support to event triggers
Previous Message Andres Freund 2014-06-15 01:12:21 Atomics hardware support table & supported architectures