Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-05 19:52:23
Message-ID: 9f370c3e-9a1b-4db0-ad93-06f14e7dcc37@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote:
> Anyway, I hacked together a trivial set backed by an open addressing
> hash table:
>
> https://github.com/tvondra/hashset
>
> It's super-basic, providing just some bare minimum of functions, but
> hopefully good enough for experiments.
>
> - hashset data type - hash table in varlena
> - hashset_init - create new hashset
> - hashset_add / hashset_contains - add value / check membership
> - hashset_merge - merge two hashsets
> - hashset_count - count elements
> - hashset_to_array - return
> - hashset(int) aggregate
>
> This allows rewriting the recursive query like this:
>
> WITH RECURSIVE friends_of_friends AS (
...

Nice! I get similar results, 737 ms (hashset) vs 1809 ms (array_agg).

I think if you just add one more hashset function, it will be a win against roaringbitmap, which is 400 ms.

The missing function is an agg that takes hashset as input and returns hashset, similar to roaringbitmap's rb_or_agg().

With such a function, we could add an adjacent nodes hashset column to the `nodes` table, which would eliminate the need to scan the edges table for graph traversal:

We could then benchmark roaringbitmap against hashset querying the same table:

CREATE TABLE users AS
SELECT
from_node AS id,
rb_build_agg(to_node) AS friends_roaringbitmap,
hashset(to_node) AS friends_hashset
FROM edges
GROUP BY 1;

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-06-05 20:08:24 Re: Let's make PostgreSQL multi-threaded
Previous Message Peter Geoghegan 2023-06-05 19:04:29 Re: Cleaning up nbtree after logical decoding on standby work