Re: Do we want a hashset type?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joel Jacobson <joel(at)compiler(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-10 15:46:00
Message-ID: 9ac829ac-7516-5746-8d50-e9c5e5240956@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2023-06-09 Fr 07:56, Joel Jacobson wrote:
> On Fri, Jun 9, 2023, at 13:33, jian he wrote:
> > Hi, I am quite new about C.....
> > The following function I have 3 questions.
> > 1. 7691,4201, I assume they are just random prime ints?
>
> Yes, 7691 and 4201 are likely chosen as random prime numbers.
> In hash functions, prime numbers are often used to help in evenly
> distributing
> the hash values across the range and reduce the chance of collisions.
>
> > 2. I don't get the last return set, even the return type should be bool.
>
> Thanks, you found a mistake!
>
> The line
>
>     return set;
>
> is actually unreachable and should be removed.
> The function will always return either true or false within the while
> loop and
> never reach the final return statement.
>
> I've attached a new incremental patch with this fix.
>
> > 3. I don't understand 13 in hash = (hash + 13) % set->maxelements;
>
> The value 13 is used for linear probing [1] in handling hash collisions.
> Linear probing sequentially checks the next slot in the array when a
> collision
> occurs. 13, being a small prime number not near a power of 2, helps in
> uniformly
> distributing data and ensuring that all slots are probed, as it's
> relatively prime
> to the hash table size.
>
> Hm, I realise we actually don't ensure the hash table size and step
> size (13)
> are coprime. I've fixed that in the attached patch as well.
>
> [1] https://en.wikipedia.org/wiki/Linear_probing
>
>

Maybe you can post a full patch as well as incremental?

Stylistically I think you should reduce reliance on magic numbers (like
13). Probably need some #define's?

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2023-06-10 15:51:30 Re: Do we want a hashset type?
Previous Message Joe Conway 2023-06-10 15:19:46 Re: Cleaning up threading code