Re: GIT patch

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIT patch
Date: 2007-08-02 07:52:51
Message-ID: 46B18D53.5010700@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Hmm, do say, doesn't it seem like the lack of feedback and the failed
> bitmap patch played against final development of this patch?

Yes :(. That's a one reason why I tried to help with the review of that
patch.

> At this
> point I feel like the patch still needs some work and reshuffling before
> it is in an acceptable state. The fact that there are some API changes
> for which the patch needs to be adjusted makes me feel like we should
> put this patch on hold for 8.4. So we would first get the API changes
> discussed and done and then adapt this patch to them.

I hate to say it but I agree. Getting the API changes discussed and
committed was my plan in February/March. Unfortunately it didn't happen
back then.

There's a few capabilities we need from the new API:

1. Support for candidate matches. Because a clustered index doesn't
contain the key for every heap tuple, when you search for a value we
don't know exactly which ones match. Instead, you get a bunch of
candidate matches, which need to be rechecked after fetching the heap
tuple. Oleg & Teodor pointed out that GiST could use the capability as
well. I also proposed a while ago to change the hash index
implementation so that it doesn't store the index key in the index, but
just the hash of it. That again would need the support for candidate
matches. And there's range-encoded bitmap indexes, if we implement them
in a more distant future.

2. Support to sort the heap tuples represented by one index tuple, in
normal index scans, if we go with alternative 1. Or support to do binary
searches over them, if we go with alternative 2 or 3. As the patch
stands, the sorting is done within b-tree, but that's quite ugly.

> Of the three proposals you suggest, I think the first one
>
>> 1. A grouped index tuple contains a bitmap of offsetnumbers,
>> representing a bunch of heap tuples stored on the same heap page, that
>> all have a key between the key stored on the index tuple and the next
>> index tuple. We don't keep track of the ordering of the heap tuples
>> represented by one group index tuple. When doing a normal, non-bitmap,
>> index scan, they need to be sorted. This is what the patch currently
>> implements.
>
> makes the most sense -- the index is keep simple and fast, and doing the
> sorting during an indexscan seems a perfectly acceptable compromise,
> knowing that the amount of tuples possible returned for sort is limited
> by the heap blocksize.

The overhead is shown in the CPU test results, particularly in the
select_range* tests, I put up on the git web site:
http://community.enterprisedb.com/git/.

The other alternatives might be simpler, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Klyukin 2007-08-02 09:53:11 Re: GIT patch
Previous Message Pavan Deolasee 2007-08-02 07:33:08 Re: HOT patch - version 11