Re: [RFC] Minmax indexes

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Minmax indexes
Date: 2013-06-15 15:15:06
Message-ID: CA+U5nMKL2h6-fXHTJix_YEktFKjDOXOTnD5=UtDF8qSoVpqmzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 June 2013 00:01, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Alvaro,
>
> This sounds really interesting, and I can see the possibilities.
> However ...
>
>> Value changes in columns that are part of a minmax index, and tuple insertion
>> in summarized pages, would invalidate the stored min/max values. To support
>> this, each minmax index has a validity map; a range can only be considered in a
>> scan if it hasn't been invalidated by such changes (A range "not considered" in
>> the scan needs to be returned in whole regardless of the stored min/max values,
>> that is, it cannot be pruned per query quals). The validity map is very
>> similar to the visibility map in terms of performance characteristics: quick
>> enough that it's not contentious, allowing updates and insertions to proceed
>> even when data values violate the minmax index conditions. An invalidated
>> range can be made valid by re-summarization (see below).
>
> This begins to sound like these indexes are only useful on append-only
> tables. Not that there aren't plenty of those, but ...

The index is basically using the "index only scan" mechanism. The
"only useful on append-only tables" comment would/should apply also to
index only scans. I can't see a reason to raise that specifically for
this index type.

>> Re-summarization is relatively expensive, because the complete page range has
>> to be scanned.
>
> Why? Why can't we just update the affected pages in the index?

Again, same thing as index-only scans. For IOS, we reset the
visibility info at vacuum. The route proposed here follows exactly the
same timing, same mechanism. I can't see a reason for any difference
between the two.

>> To avoid this, a table having a minmax index would be
>> configured so that inserts only go to the page(s) at the end of the table; this
>> avoids frequent invalidation of ranges in the middle of the table. We provide
>> a table reloption that tweaks the FSM behavior, so that summarized pages are
>> not candidates for insertion.
>
> We haven't had an index type which modifies table insertion behavior
> before, and I'm not keen to start now; imagine having two indexes on the
> same table each with their own, conflicting, requirements. This is
> sounding a lot more like a candidate for our prospective pluggable
> storage manager. Also, the above doesn't help us at all with UPDATEs.
>
> If we're going to start adding reloptions for specific table behavior,
> I'd rather think of all of the optimizations we might have for a
> prospective "append-only table" and bundle those, rather than tying it
> to whether a certain index exists or not.

I agree that the FSM behaviour shouldn't be linked to index existence.

IMHO that should be a separate table parameter, WITH (fsm_mode = append)

Index only scans would also benefit from that.

> Also, I hate the name ... if this feature goes ahead, I'm going to be
> lobbying to change it. But that's pretty minor compared to the update
> issues.

This feature has already had 3 different names. I don't think the name
is crucial, but it makes sense to give it a name up front. So if you
want to lobby for that then you'd need to come up with a name soon, so
poor Alvaro can cope with name #4.

(There's no consistency in naming from any other implementation either).

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-06-15 15:29:45 Re: stray SIGALRM
Previous Message Andres Freund 2013-06-15 15:08:34 Re: stray SIGALRM