Re: Hash index build performance tweak from sorting

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash index build performance tweak from sorting
Date: 2022-09-21 01:31:52
Message-ID: CAApHDvr-bFT5C3OyM17mDyNjY0D9bgqmH1TF1VZ54Tv1+khpug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2 Aug 2022 at 03:37, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> Using the above test case, I'm getting a further 4-7% improvement on
> already committed code with the attached patch, which follows your
> proposal.
>
> The patch passes info via a state object, useful to avoid API churn in
> later patches.

Hi Simon,

I took this patch for a spin and saw a 2.5% performance increase using
the random INT test that Tom posted. The index took an average of
7227.47 milliseconds on master and 7045.05 with the patch applied.

On making a pass of the changes, I noted down a few things.

1. In _h_spoolinit() the HSpool is allocated with palloc and then
you're setting the istate field to a pointer to the HashInsertState
which is allocated on the stack by the only calling function
(hashbuild()). Looking at hashbuild(), it looks like the return value
of _h_spoolinit is never put anywhere to make it available outside of
the function, so it does not seem like there is an actual bug there.
However, it just seems like a bug waiting to happen. If _h_spoolinit()
is pallocing memory, then we really shouldn't be setting pointer
fields in that memory to point to something on the stack. It might be
nicer if the istate field in HSpool was a HashInsertStateData and
_h_spoolinit() just memcpy'd the contents of that parameter. That
would make HSpool 4 bytes smaller and save additional pointer
dereferences in _hash_doinsert().

2. There are quite a few casts that are not required. e.g:

_hash_doinsert(rel, itup, heapRel, (HashInsertState) &istate);
buildstate.spool = _h_spoolinit(heap, index, num_buckets,
(HashInsertState) &insertstate);
buildstate.istate = (HashInsertState) &insertstate;

This is just my opinion, but I don't really see the value in having a
typedef for a pointer to HashInsertStateData. I can understand that if
the struct was local to a .c file, but you've got the struct and
pointer typedef in the same header. I understand we often do this in
the code, but I feel like we do it less often in newer code. e.g we do
it in aset.c but not generation.c (which is much newer than aset.c).
My personal preference would be just to name the struct
HashInsertState and have no extra pointer typedefs.

3. Just a minor nitpick. Line wraps at 80 chars. You're doing this
sometimes but not others. This seems just to be due to the additional
function parameters that have been added.

4. I added the following Assert to _hash_pgaddtup() as I expected the
itup_off to be set to the same thing before and after this change. I
see the Assert is failing in the regression tests.

Assert(PageGetMaxOffsetNumber(page) + 1 ==
_hash_binsearch(page, _hash_get_indextuple_hashkey(itup)));

I think this is because _hash_binsearch() returns the offset with the
first tuple with the given hashkey, so if there are duplicate hashkey
values then it looks like PageAddItemExtended() will set needshuffle
and memmove() the existing item(s) up one slot. I don't know this
hash index building code very well, but I wonder if it's worth having
another version of _hash_binsearch() that can be used to make
_hash_pgaddtup() put any duplicate hashkeys after the existing ones
rather than before and shuffle the others up? It sounds like that
might speed up normal insertions when there are many duplicate values
to hash.

I wonder if this might be the reason the random INT test didn't come
out as good as your original test which had unique values. The unique
values test would do less shuffling during PageAddItemExtended(). If
so, that implies that skipping the binary search is only part of the
gains here and that not shuffling tuples accounts for quite a bit of
the gain you're seeing. If so, then it would be good to not have to
shuffle duplicate hashkey tuples up in the page during normal
insertions as well as when building the index.

In any case, it would be nice to have some way to assert that we don't
accidentally pass sorted==true to _hash_pgaddtup() when there's an
existing item on the page with a higher hash value. Maybe we could
just look at the hash value of the last tuple on the page and ensure
it's <= to the current one?

5. I think it would be nicer to move the insertstate.sorted = false;
into the else branch in hashbuild(). However, you might have to do
that anyway if you were to do what I mentioned in #1.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-09-21 01:57:46 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Michael Paquier 2022-09-21 01:31:47 Re: predefined role(s) for VACUUM and ANALYZE