Re: Do we want a hashset type?

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Dunstan <pgsql(at)tomd(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-28 06:26:52
Message-ID: CACJufxG5OM61djHJ54QkZa6MTytsSN6wkcALT_hOFp9J+=ge3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 27, 2023 at 4:27 PM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> On Tue, Jun 27, 2023, at 04:35, jian he wrote:
> > in SQLMultiSets.pdf(previously thread) I found a related explanation
> > on page 45, 46.
> > /home/jian/hashset/0001-make-int4hashset_contains-strict-and-header-file-change.patch
> > (CASE WHEN OP1 IS NULL OR OP2 IS NULL THEN NULL ELSE MULTISET ( SELECT
> > T1.V FROM UNNEST (OP1) AS T1 (V) INTERSECT SQ SELECT T2.V FROM UNNEST
> > (OP2) AS T2 (V) ) END)
> >
> > CASE WHEN OP1 IS NULL OR OP2 IS NULL THEN NULL ELSE MULTISET ( SELECT
> > T1.V FROM UNNEST (OP1) AS T1 (V) UNION SQ SELECT T2.V FROM UNNEST
> > (OP2) AS T2 (V) ) END
> >
> > (CASE WHEN OP1 IS NULL OR OP2 IS NULL THEN NULL ELSE MULTISET ( SELECT
> > T1.V FROM UNNEST (OP1) AS T1 (V) EXCEPT SQ SELECT T2.V FROM UNNEST
> > (OP2) AS T2 (V) ) END)
>
> Thanks! This was exactly what I was looking for, I knew I've seen it but failed to find it.
>
> Attached is a new incremental patch as well as a full patch, since this is a substantial change:
>
> Align null semantics with SQL:2023 array and multiset standards
>
> * Introduced a new boolean field, null_element, in the int4hashset_t type.
>
> * Rename hashset_count() to hashset_cardinality().
>
> * Rename hashset_merge() to hashset_union().
>
> * Rename hashset_equals() to hashset_eq().
>
> * Rename hashset_neq() to hashset_ne().
>
> * Add hashset_to_sorted_array().
>
> * Handle null semantics to work as in arrays and multisets.
>
> * Update int4hashset_add() to allow creating a new set if none exists.
>
> * Use more portable int32 typedef instead of int32_t.
>
> This also adds a thorough test suite in array-and-multiset-semantics.sql,
> which aims to test all relevant combinations of operations and values.
>
> Makefile | 2 +-
> README.md | 6 ++--
> hashset--0.0.1.sql | 37 +++++++++++---------
> hashset-api.c | 208 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------
> hashset.c | 12 ++++++-
> hashset.h | 11 +++---
> test/expected/array-and-multiset-semantics.out | 365 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> test/expected/basic.out | 12 +++----
> test/expected/reported_bugs.out | 6 ++--
> test/expected/strict.out | 114 ------------------------------------------------------------
> test/expected/table.out | 8 ++---
> test/sql/array-and-multiset-semantics.sql | 232 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> test/sql/basic.sql | 4 +--
> test/sql/benchmark.sql | 14 ++++----
> test/sql/reported_bugs.sql | 6 ++--
> test/sql/strict.sql | 32 -----------------
> test/sql/table.sql | 2 +-
> 17 files changed, 823 insertions(+), 248 deletions(-)
>
> /Joel

Hi there.
I changed the function hashset_contains to strict.
also change the way to return an empty array.

in benchmark.sql, would it be ok to use EXPLAIN to demonstrate that
int4hashset can speed distinct aggregate and distinct counts?
like the following:

explain(analyze, costs off, timing off, buffers)
SELECT array_agg(DISTINCT i) FROM benchmark_input_100k \watch c=3

explain(analyze, costs off, timing off, buffers)
SELECT hashset_agg(i) FROM benchmark_input_100k \watch c=3

explain(costs off,timing off, analyze,buffers)
select count(distinct rnd) from benchmark_input_100k \watch c=3

explain(costs off,timing off, analyze,buffers)
SELECT hashset_cardinality(x) FROM (SELECT hashset_agg(rnd) FROM
benchmark_input_100k) sub(x) \watch c=3

Attachment Content-Type Size
0001-make-int4hashset_contains-strict-and-header-file-change.patch text/x-patch 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-06-28 06:31:54 RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Daniel Gustafsson 2023-06-28 06:26:39 Re: sslinfo extension - add notbefore and notafter timestamps