Re: Minmax indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2013-09-25 07:47:35
Message-ID: CAA4eK1J3eMAoRTk2CVoJ_csHB8=4xcM823ptJ6LQ9bj_uhEojA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> Here's a reviewable version of what I've dubbed Minmax indexes. Some
> people said they would like to use some other name for this feature, but
> I have yet to hear usable ideas, so for now I will keep calling them
> this way. I'm open to proposals, but if you pick something that cannot
> be abbreviated "mm" I might have you prepare a rebased version which
> renames the files and structs.
>
> The implementation here has been simplified from what I originally
> proposed at 20130614222805(dot)GZ5491(at)eldon(dot)alvh(dot)no-ip(dot)org -- in particular,
> I noticed that there's no need to involve aggregate functions at all; we
> can just use inequality operators. So the pg_amproc entries are gone;
> only the pg_amop entries are necessary.
>
> I've somewhat punted on the question of doing resummarization separately
> from vacuuming. Right now, resummarization (as well as other necessary
> index cleanup) takes place in amvacuumcleanup. This is not optimal; I
> have stated elsewhere that I'd like to create separate maintenance
> actions that can be carried out by autovacuum. That would be useful
> both for Minmax indexes and GIN indexes (pending insertion list); maybe
> others. That's not part of this patch, however.
>
> The design of this stuff is in the file "minmax-proposal" at the top of
> the tree. That file is up to date, though it still contains some open
> questions that were present in the original proposal. (I have not fixed
> some bogosities pointed out by Noah, for instance. I will do that
> shortly.) In a final version, that file would be applied as
> src/backend/access/minmax/README, most likely.
>
> One area on which I needed to modify core code is IndexBuildHeapScan. I
> needed a version that was able to scan only a certain range of pages,
> not the entire table, so I introduced a new IndexBuildHeapRangeScan, and
> added a quick "heap_scansetlimits" function. I haven't tested that this
> works outside of the HeapRangeScan thingy, so it's probably completely
> bogus; I'm open to suggestions if people think this should be
> implemented differently. In any case, keeping that implementation
> together with vanilla IndexBuildHeapScan makes a lot of sense.
>
> One thing still to tackle is when to mark ranges as unsummarized. Right
> now, any new tuple on a page range would cause a new index entry to be
> created and a new revmap update. This would cause huge index bloat if,
> say, a page is emptied and vacuumed and filled with new tuples with
> increasing values outside the original range; each new tuple would
> create a new index tuple. I have two ideas about this (1. mark range as
> unsummarized if 3rd time we touch the same page range;

Why only at 3rd time?
Doesn't it need to be precise, like if someone inserts a row having
value greater than max value of corresponding index tuple,
then that index tuple's corresponding max value needs to be updated
and I think its updated with the help of validity map.

For example:
considering we need to store below info for each index tuple:
In each index tuple (corresponding to one page range), we store:
- first block this tuple applies to
- last block this tuple applies to
- for each indexed column:
* min() value across all tuples in the range
* max() value across all tuples in the range

Assume first and last block for index tuple is same (assume block
no. 'x') and min value is 5 and max is 10.
Now user insert/update value in block 'x' such that max value of
index col. is 11, if we don't update corresponding
index tuple or at least invalidate it, won't it lead to wrong results?

> 2. vacuum the
> affected index page if it's full, so we can maintain the index always up
> to date without causing unduly bloat), but I haven't implemented
> anything yet.
>
> The "amcostestimate" routine is completely bogus; right now it returns
> constant 0, meaning the index is always chosen if it exists.

I think for first version, you might want to keep things simple, but
there should be some way for optimizer to select this index.
So rather than choose if it is present, we can make optimizer choose
when some-one says set enable_minmax index to true.

How about keeping this up-to-date during foreground operations.
Vacuum/Maintainer task maintaining things usually have problems of
bloat and
then we need optimize/workaround issues.
Lot of people have raised this or similar point previously and what
I read you are of opinion that it seems to be slow.
I really don't think that it can be so slow that adding so much
handling to get it up-to-date by some maintainer task is useful.
Currently there are
systems like Oracle where index clean-up is mainly done during
foreground operation, so this alone cannot be reason for slowness.

Comparing the logic with IOS is also not completely right as for
IOS, we need to know each tuple's visibility, which is not the case
here.

Now it can so happen that min and max values are sometimes not right
because later the operation is rolled back, but I think such cases
will
be less and we can find some way to handle such cases may be
maintainer task only, but the handling will be quite simpler.

On Windows, patch gives below compilation errors:
src\backend\access\minmax\mmtuple.c(96): error C2057: expected
constant expression
src\backend\access\minmax\mmtuple.c(96): error C2466: cannot
allocate an array of constant size 0
src\backend\access\minmax\mmtuple.c(96): error C2133: 'values' : unknown size
src\backend\access\minmax\mmtuple.c(97): error C2057: expected
constant expression
src\backend\access\minmax\mmtuple.c(97): error C2466: cannot
allocate an array of constant size 0
src\backend\access\minmax\mmtuple.c(97): error C2133: 'nulls' : unknown size
src\backend\access\minmax\mmtuple.c(102): error C2057: expected
constant expression
src\backend\access\minmax\mmtuple.c(102): error C2466: cannot
allocate an array of constant size 0
src\backend\access\minmax\mmtuple.c(102): error C2133:
'phony_nullbitmap' : unknown size
src\backend\access\minmax\mmtuple.c(110): warning C4034: sizeof returns 0
src\backend\access\minmax\mmtuple.c(246): error C2057: expected
constant expression
src\backend\access\minmax\mmtuple.c(246): error C2466: cannot
allocate an array of constant size 0
src\backend\access\minmax\mmtuple.c(246): error C2133: 'values' : unknown size
src\backend\access\minmax\mmtuple.c(247): error C2057: expected
constant expression
src\backend\access\minmax\mmtuple.c(247): error C2466: cannot
allocate an array of constant size 0
src\backend\access\minmax\mmtuple.c(247): error C2133: 'allnulls' :
unknown size
src\backend\access\minmax\mmtuple.c(248): error C2057: expected
constant expression
src\backend\access\minmax\mmtuple.c(248): error C2466: cannot
allocate an array of constant size 0
src\backend\access\minmax\mmtuple.c(248): error C2133: 'hasnulls' :
unknown size

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2013-09-25 07:54:11 Re: invalid regexp crashes the server on windows or 9.3
Previous Message Marc Mamin 2013-09-25 07:33:02 invalid regexp crashes the server on windows or 9.3