Re: Hash index build patch has *worse* performance at small table sizes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tom Raney <twraney(at)comcast(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index build patch has *worse* performance at small table sizes
Date: 2008-06-21 22:34:47
Message-ID: 200806212234.m5LMYl114305@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Did we ever do anything about this?

---------------------------------------------------------------------------

Tom Lane wrote:
> I've been reviewing the hash index build patch submitted here:
> http://archives.postgresql.org/pgsql-patches/2007-10/msg00154.php
>
> Although it definitely helps on large indexes, it's actually
> counterproductive on not-so-large ones. The test case I'm using
> is random integers generated like this:
> create table foo as select (random() * N)::int as f1
> from generate_series(1,N);
> select count(*) from foo; -- force hint bit updates
> checkpoint;
> then timing
> create index fooi on foo using hash(f1);
>
> Using all-default configuration settings on some not-very-new hardware,
> at N = 1E6 I see
>
> 8.3.1: 30 sec
> With pre-expansion of index (CVS HEAD): 24 sec
> With sorting: 72 sec
> To build a btree index on same data: 34 sec
>
> Now this isn't amazingly surprising, because the original argument for
> doing sorting was to improve locality of access to the index during
> the build, and that only matters if you've got an index significantly
> bigger than memory. If the index fits in RAM then the sort is pure
> overhead.
>
> The obvious response to this is to use the sorting approach only when
> the estimated index size exceeds some threshold. One possible choice of
> threshold would be shared_buffers (or temp_buffers for a temp index)
> but I think that is probably too conservative, because in most scenarios
> the kernel's disk cache is available too. Plus you can't tweak that
> setting without a postmaster restart. I'm tempted to use
> effective_cache_size, which attempts to measure an appropriate number
> and can be set locally within the session doing the CREATE INDEX if
> necessary. Or we could invent a new GUC parameter, but that is probably
> overkill.
>
> Comments?
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-06-22 00:37:56 Re: Hash index build patch has *worse* performance at small table sizes
Previous Message Andrew Dunstan 2008-06-21 16:05:55 Re: Not valid dump [8.2.9, 8.3.1]