From: | Kenneth Marshall <ktm(at)rice(dot)edu> |
---|---|
To: | Gregory Stark <stark(at)enterprisedb(dot)com> |
Cc: | Tom Raney <twraney(at)comcast(dot)net>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Hash index todo list item |
Date: | 2007-09-25 16:26:05 |
Message-ID: | 20070925162604.GO14440@it.is.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
> "Kenneth Marshall" <ktm(at)rice(dot)edu> writes:
>
> > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> >
> >> Using our implementation, build times and index sizes are
> >> comparable with btree index build times and index sizes.
> >...
> > That is super! (and timely)
>
> It is pretty super. I have a few comments to raise but don't take it to be too
> negative, it sounds like this is a big step forward towards making hash
> indexes valuable.
>
> Firstly in the graphs it seems the index size graph has an exponential x-axis
> but a linear y-axis. This makes it hard to see what I would expect to be
> pretty much linear growth. The curves look exponential which would mean linear
> growth but of course it's hard to tell.
>
> Also, the growth in the time chart looks pretty much linear. That seems weird
> since I would expect there would be a visible extra component since sort times
> are n-log(n). Perhaps you need to test still larger tables to see that though.
>
> In any case it's clear from the results you have there that the change is a
> positive one and fixes a fundamental problem with the hash index build code.
>
> Something else you should perhaps test is indexing something which is
> substantially larger than hash function output. A hash function is going to
> make the most sense when indexing something like strings for which you want to
> avoid the long comparison costs. Especially consider testing this on a UTF8
> locale with expensive comparisons (like a CJK locale for example).
>
> Note that the bottom line for the problems with hash indexes is that the
> current implementation doesn't offer any advantages over btree indexes. Hash
> indexes need to be not only as good of a choice as btree indexes but
> significantly better a significantly better choice at least for some specific
> circumstances.
>
> Also, if you're going to submit a patch you should check out a copy of the CVS
> HEAD and work from that. I don't think there are huge differences in the area
> of hash indexes though. But in most other areas you would be spending quite a
> bit of time dealing details which have changed since.
>
> Finally note that we're in the final throes of the 8.3 feature freeze.
> Normally any patch submitted now would be held until 8.3 is released and
> development on 8.4 is started. I could imagine an exception being made for
> hash indexes since they're so moribund currently but probably not. The flip
> side of that argument is that there's not much point in making an exception
> for something which will only be really useful once further work is done in
> the same area.
>
Although I am very excited about this patch, I do not see any real value
in including it in 8.3. As you mention, we need to to have a hash index
implementation that outperforms btree in some problem regime and that is
currently not the case. I have just recently started the process of
gathering ideas and having discussions on various approaches to making
hash indexes more performant and we have a number of ideas on which to
start our investigation. I do think that this patch will make the testing
and evaluation, that will be needed to truly improve the hash index, much
much easier.
Regards,
Ken
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-09-25 16:29:21 | Re: Turn off vacuum in pgbench? |
Previous Message | Andrew Dunstan | 2007-09-25 15:44:27 | Re: Reducing NUMERIC size for 8.3 |