Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "jian he" <jian(dot)universality(at)gmail(dot)com>
Cc: "Tom Dunstan" <pgsql(at)tomd(dot)cc>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-25 09:42:52
Message-ID: 183b5dbc-0647-43a3-a02e-0a3abbf8a1ac@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 24, 2023, at 21:16, Joel Jacobson wrote:
> New version of int4hashset_contains() that should follow the same
> General Rules as MULTISET's MEMBER OF (8.16 <member predicate>).
...
> SELECT hashset_contains('{}'::int4hashset, NULL::int); -- false
...
> SELECT hashset_contains('{null}'::int4hashset, NULL::int); -- null

When it comes to SQL, the general rule of thumb is that expressions and functions
handling null usually return the null value. This is why it might feel a bit out
of the ordinary to return False when checking if an empty set contains NULL.

However, that's my understanding of the General Rules on page 553 of
ISO/IEC 9075-2:2023(E). Rule 3 Case a) specifically states:

"If N is 0 (zero), then the <member predicate> is False.",

where N is the cardinality, and for an empty set, that's 0 (zero).

Rule 3 Case b) goes on to say:

"If at least one of XV and MV is the null value, then the
<member predicate> is Unknown."

But since b) follows a), and the condition for a) already matches, b) is out of
the running. This leads me to believe that the result of:

SELECT hashset_contains('{}'::int4hashset, NULL::int);

would be False, according to the General Rules.

Now, this is based on the assumption that the Case conditions are evaluated in
sequence, stopping at the first match. Does that assumption hold water?

Applying the same rules, we'd have to return Unknown (which we represent as
null) for:

SELECT hashset_contains('{null}'::int4hashset, NULL::int);

Here, since the cardinality N is 1, Case a) doesn't apply, but Case b) does
since XV is null.

Looking ahead, we're entertaining the possibility of a future SET SQL-syntax
feature and wondering how our hashset type could be adapted to be compatible and
reusable for such a development. It's a common prediction that any future SET
syntax feature would probably operate on Three-Valued Logic. Therefore, it's key
for our hashset to handle null values, whether storing, identifying, or adding
them.

But here's my two cents, and remember it's just a personal viewpoint. I'm not so
sure that the hashset type functions need to mirror the corresponding MULTISET
language constructs exactly. In my book, our hashset catalog functions could
take a more clear-cut route with null handling, as long as our data structure is
prepared to handle null values.

Think about this possibility:

hashset_contains_null(int4hashset) -> boolean
hashset_add_null(int4hashset) -> int4hashset
hashset_contains(..., NULL) -> ERROR
hashset_add(..., NULL) -> ERROR

In my mind, this explicit null handling could simplify things, clear up any
potential confusion, and at the same time pave the way for compatibility with
any future SET SQL-syntax feature.

Thoughts?

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2023-06-25 12:05:09 Row pattern recognition
Previous Message Michael Banck 2023-06-25 09:10:00 Re: Stampede of the JIT compilers