Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Do we want a hashset type?
Date: 2023-05-31 14:09:55
Message-ID: 432e0bf3-5a04-4a0b-8236-ff625008e46b@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been working with a social network start-up that uses PostgreSQL as their
only database. Recently, they became interested in graph databases, largely
because of an article [1] suggesting that a SQL database "just chokes" when it
encounters a depth-five friends-of-friends query (for a million users having
50 friends each).

The article didn't provide the complete SQL queries, so I had to buy the
referenced book to get the full picture. It turns out, the query was a simple
self-join, which, of course, isn't very efficient. When we rewrote the query
using a modern RECURSIVE CTE, it worked but still took quite some time.

Of course, there will always be a need for specific databases, and some queries
will run faster on them. But, if PostgreSQL could handle graph queries with a
Big-O runtime similar to graph databases, scalability wouldn't be such a big
worry.

Just like the addition of the JSON type to PostgreSQL helped reduce the hype
around NoSQL, maybe there's something simple that's missing in PostgreSQL that
could help us achieve the same Big-O class performance as graph databases for
some of these type of graph queries?

Looking into the key differences between PostgreSQL and graph databases,
it seems that one is how they store adjacent nodes. In SQL, a graph can be
represented as one table for the Nodes and another table for the Edges.
For a friends-of-friends query, we would need to query Edges to find adjacent
nodes, recursively.

Graph databases, on the other hand, keep adjacent nodes immediately accessible
by storing them with the node itself. This looks like a major difference in
terms of how the data is stored.

Could a hashset type help bridge this gap?

The idea would be to store adjacent nodes as a hashset column in a Nodes table.

Apache AGE is an option for users who really need a new graph query language.
But I believe there are more users who occasionally run into a graph problem and
would be glad if there was an efficient way to solve it in SQL without having
to bring in a new database.

/Joel

[1] https://neo4j.com/news/how-much-faster-is-a-graph-database-really/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-05-31 14:12:42 Re: Refactor ssl tests to avoid using internal PostgreSQL::Test::Cluster methods
Previous Message Jacob Champion 2023-05-31 14:08:39 Re: Docs: Encourage strong server verification with SCRAM