Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-07 14:21:52
Message-ID: d8759507-7db8-4cae-b13e-21ae4e382b89@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote:
> it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms
> on your system. There's a bunch of opportunities for more improvements,
> as the hash table implementation is pretty naive/silly, the on-disk
> format is wasteful and so on.
>
> But before spending more time on that, it'd be interesting to know what
> would be a competitive timing. I mean, what would be "good enough"? What
> timings are achievable with graph databases?

Your hashset is now almost exactly as fast as the corresponding roaringbitmap query, +/- 1 ms on my machine.

I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
However, I've probably misunderstood something, maybe I need to add some index or something.
Even so, it's interesting it's apparently not fast "by default".

The query I tested:
MATCH (user:User {id: '5867'})-[:FRIENDS_WITH*3..3]->(fof)
RETURN COUNT(DISTINCT fof)

Here is how I loaded the data into it:

% pwd
/Users/joel/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-3837aa22-c830-4dcf-8668-ef8e302263c7

% head import/*
==> import/friendships.csv <==
1,13,FRIENDS_WITH
1,11,FRIENDS_WITH
1,6,FRIENDS_WITH
1,3,FRIENDS_WITH
1,4,FRIENDS_WITH
1,5,FRIENDS_WITH
1,15,FRIENDS_WITH
1,14,FRIENDS_WITH
1,7,FRIENDS_WITH
1,8,FRIENDS_WITH

==> import/friendships_header.csv <==
:START_ID(User),:END_ID(User),:TYPE

==> import/users.csv <==
1,User
2,User
3,User
4,User
5,User
6,User
7,User
8,User
9,User
10,User

==> import/users_header.csv <==
id:ID(User),:LABEL

% ./bin/neo4j-admin database import full --overwrite-destination --nodes=User=import/users_header.csv,import/users.csv --relationships=FRIENDS_WIDTH=import/friendships_header.csv,import/friendships.csv neo4j

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mehmet Emin KARAKAŞ 2023-06-07 14:25:28 [DOCS] alter_foreign_table.sgml typo
Previous Message Joseph Koshakow 2023-06-07 14:15:46 Re: is_superuser is not documented