Re: Do we want a hashset type?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Tom Dunstan <pgsql(at)tomd(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-20 18:43:24
Message-ID: 7bf1b03d-c52a-eb2c-38c9-8019f4207387@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/20/23 20:08, jian he wrote:
> On Wed, Jun 21, 2023 at 12:25 AM Tomas Vondra
> ...
>> http://www.wiscorp.com/sqlmultisets.zip
>
>> Conceptually, a multiset is an unordered collection of elements, all of the same type, with dupli-
>> cates permitted. Unlike arrays, a multiset is an unbounded collection, with no declared maximum
>> cardinality. This does not mean that the user can insert elements in a multiset without limit, just
>> that the standard does not mandate that there should be a limit. This is analogous to tables, which
>> have no declared maximum number of rows.
>
> Postgres arrays don't have size limits.

Right. You can say int[5] but we don't enforce that limit (I haven't
checked why, but presumably because we had arrays before the standard
existed, and it was more like a list in LISP or something.)

> unordered means no need to use subscript?

Yeah - there's no obvious way to subscript the items when there's no
implicit ordering.

> So multiset is a more limited array type?
>

Yes and no - both are collection types, so there are similarities and
differences. Multiset does not need to keep the ordering, so in this
sense it's a relaxed version of array.

> null is fine. but personally I feel like so far the hashset main
> feature is the quickly aggregate unique value using hashset.
> I found using hashset count distinct (non null values) is quite faster.

True. That's related to fast membership checks.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-06-20 18:49:36 Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX
Previous Message Jeff Davis 2023-06-20 18:43:05 Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX