Re: Do we want a hashset type?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Joel Jacobson <joel(at)compiler(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Tom Dunstan <pgsql(at)tomd(dot)cc>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-19 11:50:31
Message-ID: fae7d987-564d-7739-8a59-bc7f5186ac78@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2023-06-19 Mo 05:21, Tomas Vondra wrote:
>
> On 6/18/23 18:45, Andrew Dunstan wrote:
>> On 2023-06-16 Fr 20:38, Joel Jacobson wrote:
>>> New patch is attached, which will henceforth always be a complete patch,
>>> to avoid the hassle of having to assemble incremental patches.
>>
>> Cool, thanks.
>>
> It might still be convenient to keep it split into smaller, easier to
> review, parts. A patch that introduces basic functionality and then
> patches adding various "advanced" features.
>
>> A couple of random thoughts:
>>
>>
>> . It might be worth sending a version number with the send function
>> (c.f. jsonb_send / jsonb_recv). That way would would not be tied forever
>> to some wire representation.
>>
>> . I think there are some important set operations missing: most notably
>> intersection, slightly less importantly asymmetric and symmetric
>> difference. I have no idea how easy these would be to add, but even for
>> your stated use I should have thought set intersection would be useful
>> ("Who is a member of both this set of friends and that set of friends?").
>>
>> . While supporting int4 only is OK for now, I think we would at least
>> want to support int8, and probably UUID since a number of systems I know
>> of use that as an object identifier.
>>
> I agree we should aim to support a wider range of data types. Could we
> have a polymorphic type, similar to what we do for arrays and ranges? In
> fact, CREATE TYPE allows specifying ELEMENT, so wouldn't it be possible
> to implement this as a special variant of an array? Would be better than
> having a set of functions for every supported data type.
>
> (Note: It might still be possible to have a special implementation for
> selected fixed-length data types, as it allows optimization at compile
> time. But that could be done later.)

Interesting idea. There's also the keyword SETOF that we could possibly
make use of.

>
>
> The other thing I've been thinking about is the SQL syntax and what does
> the SQL standard says about this.
>
> AFAICS the standard only defines arrays and multisets. Arrays are pretty
> much the thing we have, including the ARRAY[] constructor etc. Multisets
> are similar to hashset discussed here, except that it tracks the number
> of elements for each value (which would be trivial in hashset).
>
> So if we want to make this a built-in feature, maybe we should aim to do
> the multiset thing, with the standard SQL syntax? Extending the grammar
> should not be hard, I think. I'm not sure of the underlying code
> (ArrayType, ARRAY_SUBLINK stuff, etc.) we could reuse or if we'd need a
> lot of separate code doing that.
>
>

Yes, Multisets (a.k.a. bags and a large number of other names) would be
interesting. But I wouldn't like to abandon pure sets either. Maybe a
typmod indicating the allowed multiplicity of the type?

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2023-06-19 11:54:36 Re: Do we want a hashset type?
Previous Message Amit Kapila 2023-06-19 11:43:37 Re: Assert while autovacuum was executing