Re: Do we want a hashset type?

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Dunstan <pgsql(at)tomd(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-15 04:29:14
Message-ID: CACJufxE=XCn950YfxDhY_0cu=15znnYejMJ5_EpCLzh1OJqbTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 15, 2023 at 5:04 AM Joel Jacobson <joel(at)compiler(dot)org> wrote:

> On Wed, Jun 14, 2023, at 15:16, Tomas Vondra wrote:
> > On 6/14/23 14:57, Joel Jacobson wrote:
> >> Would it be feasible to teach the planner to utilize the internal hash
> table of
> >> hashset directly? In the case of arrays, the hash table construction is
> an
> ...
> > It's definitely something I'd leave out of v0, personally.
>
> OK, thanks for guidance, I'll stay away from it.
>
> I've been doing some preparatory work on this todo item:
>
> > 3) support for other types (now it only works with int32)
>
> I've renamed the type from "hashset" to "int4hashset",
> and the SQL-functions are now prefixed with "int4"
> when necessary. The overloaded functions with
> int4hashset as input parameters don't need to be prefixed,
> e.g. hashset_add(int4hashset, int).
>
> Other changes since last update (4e60615):
>
> * Support creation of empty hashset using '{}'::hashset
> * Introduced a new function hashset_capacity() to return the current
> capacity
> of a hashset.
> * Refactored hashset initialization:
> - Replaced hashset_init(int) with int4hashset() to initialize an empty
> hashset
> with zero capacity.
> - Added int4hashset_with_capacity(int) to initialize a hashset with
> a specified capacity.
> * Improved README.md and testing
>
> As a next step, I'm planning on adding int8 support.
>
> Looks and sounds good?
>
> /Joel

I am not sure the following results are correct.
with cte as (
select hashset(x) as x
,hashset_capacity(hashset(x))
,hashset_count(hashset(x))
from generate_series(1,10) g(x))
select *
,'|' as delim
, hashset_add(x,11111::int)
,hashset_capacity(hashset_add(x,11111::int))
,hashset_count(hashset_add(x,11111::int))
from cte \gx

results:
-[ RECORD 1 ]----+-----------------------------
x | {8,1,10,3,9,4,6,2,11111,5,7}
hashset_capacity | 64
hashset_count | 10
delim | |
hashset_add | {8,1,10,3,9,4,6,2,11111,5,7}
hashset_capacity | 64
hashset_count | 11

but:
with cte as(select '{1,2}'::int4hashset as x) select
x,hashset_add(x,3::int) from cte;

returns
x | hashset_add
-------+-------------
{1,2} | {3,1,2}
(1 row)
last simple query seems more sensible to me.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-06-15 04:40:15 Re: Add a perl function in Cluster.pm to generate WAL
Previous Message Nathan Bossart 2023-06-15 04:10:44 Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX