Quick Links

Re: Hash index build performance tweak from sorting

From:	Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hash index build performance tweak from sorting
Date:	2022-08-01 15:37:22
Message-ID:	CANbhV-GBc5JoG0AneUGPZZW3o4OK5LjBGeKe_icpC3R1McrZWQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, 29 Jul 2022 at 13:49, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
>
> On Thu, 28 Jul 2022 at 19:50, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> writes:
> > > Thanks for the nudge. New version attached.
> >
> > I also see a speed improvement from this

> > ---
> > DROP TABLE IF EXISTS hash_speed;
> > CREATE unlogged TABLE hash_speed (x integer);
> > INSERT INTO hash_speed SELECT random()*10000000 FROM
> > generate_series(1,10000000) x;
> > vacuum hash_speed;
> > \timing on
> > CREATE INDEX ON hash_speed USING hash (x);
> > ---

> > Also, it seems like we've left some money on the table by not
> > exploiting downstream the knowledge that this sorting happened.
> > During an index build, it's no longer necessary for
> > _hash_pgaddtup to do _hash_binsearch, and therefore also not
> > _hash_get_indextuple_hashkey: we could just always append the new
> > tuple at the end. Perhaps checking it against the last existing
> > tuple is worth the trouble as a bug guard, but for sure we don't
> > need the log2(N) comparisons that _hash_binsearch will do.
>
> Hmm, I had that in an earlier version of the patch, not sure why it
> dropped out since I wrote it last year, but then I've got lots of
> future WIP patches in the area of hash indexes.

...

> > At this point the cfbot will start to bleat that the patch of
> > record doesn't apply, so I'm going to mark the CF entry committed.
> > If anyone wants to produce a follow-on patch, please make a
> > new entry.
>
> Will do. Thanks.

Using the above test case, I'm getting a further 4-7% improvement on
already committed code with the attached patch, which follows your
proposal.

The patch passes info via a state object, useful to avoid API churn in
later patches.

Adding to CFapp again.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachment	Content-Type	Size
hash_inserted_sorted.v2.patch	application/octet-stream	7.0 KB

In response to

Re: Hash index build performance tweak from sorting at 2022-07-29 12:49:01 from Simon Riggs

Responses

Re: Hash index build performance tweak from sorting at 2022-08-05 19:46:27 from David Zhang
Re: Hash index build performance tweak from sorting at 2022-09-21 01:31:52 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jacob Champion	2022-08-01 15:40:18	[Commitfest 2022-07] is Done!
Previous Message	Álvaro Herrera	2022-08-01 15:30:30	Re: support for MERGE