Re: Do we want a hashset type?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>, Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tom Dunstan <pgsql(at)tomd(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-25 18:56:30
Message-ID: 0db4941c-d954-617c-2bb9-a39ed11a0d63@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/25/23 15:32, jian he wrote:
>> Or maybe I just don't understand the proposal. Perhaps it'd be best if
>> jian wrote a patch illustrating the idea, and showing how it performs
>> compared to the current approach.
>
> currently joel's idea is a int4hashset. based on the code first tomas wrote.
> it looks like a non-nested an collection of unique int4. external text
> format looks like {int4, int4,int4}
> structure looks like (header +  capacity slots * int4).
> Within the capacity slots, some slots are empty, some have unique values.
>
> The textual int4hashset looks like a one dimensional array.
> so I copied/imitated src/backend/utils/adt/arrayfuncs.c code, rewrote a
> slight generic hashset input and output function.
>
> see the attached c file.
> It works fine for non-null input output for {int4hashset, int8hashset,
> timestamphashset,intervalhashset,uuidhashset).

So how do you define a table with a "set" column? I mean, with the
original patch we could have done

CREATE TABLE (a int4hashset);

and then store / query this. How do you do that with this approach?

I've looked at the patch only very briefly - it's really difficult to
grok such patches - large, with half the comments possibly obsolete etc.
So what does reusing the array code give us, really?

I'm not against reusing some of the array code, but arrays seem to be
much more elaborate (multiple dimensions, ...) so the code needs to do
significantly more stuff in various cases.

When I previously suggested that maybe we should get "inspiration" from
the array code, I was mostly talking about (a) type polymorphism, i.e.
doing sets for arbitrary types, and (b) integrating this into grammar
(instead of using functions).

I don't see how copying arrayfuncs.c like this achieves either of these
things. It still hardcodes just a handful of selected data types, and
the array polymorphism relies on automatic creation of array type for
every scalar type.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2023-06-25 19:21:24 Re: Stampede of the JIT compilers
Previous Message Steve Chavez 2023-06-25 18:36:15 Fwd: Castable Domains for different JSON representations