Re: Do we want a hashset type?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Tom Dunstan <pgsql(at)tomd(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-20 16:25:52
Message-ID: e06d23b1-ba86-3637-c9bc-8afc6b96a5ee@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/20/23 16:56, Joel Jacobson wrote:
> On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote:
>> On 6/20/23 12:59, Joel Jacobson wrote:
>>> On Mon, Jun 19, 2023, at 02:00, jian he wrote:
>>>> select hashset_contains('{1,2}'::int4hashset,NULL::int);
>>>> should return null?
>>>
>>> I agree, it should.
>>>
>>> I've now changed all functions except int4hashset() (the init function)
>>> and the aggregate functions to be STRICT.
>>
>> I don't think this is correct / consistent with what we do elsewhere.
>> IMHO it's perfectly fine to have a hashset containing a NULL value,
>
> The reference to consistency with what we do elsewhere might not be entirely
> applicable in this context, since the set feature we're designing is a new beast
> in the SQL landscape.
>

I don't see how it's new, considering relational algebra is pretty much
based on (multi)sets, and the three-valued logic with NULL values is
pretty well established part of that.

> I think adhering to the theoretical purity of sets by excluding NULLs aligns us
> with set theory, simplifies our code, and parallels set implementations in other
> languages.
>

I don't see how that would be more theoretically pure, really. The
three-valued logic is a well established part of relational algebra, so
not respecting that is more a violation of the purity.

> I think we have an opportunity here to innovate and potentially influence a
> future set concept in the SQL standard.
>

I doubt this going to influence what the SQL standard says, especially
because it already defined the behavior for MULTISETS (of which the sets
are a special case, pretty much). So this has 0% chance of success.

> However, I see how one could argue against this reasoning, on the basis that
> PostgreSQL users might be more familiar with and expect NULLs can exist
> everywhere in all data structures.
>

Right, it's what we already do for similar cases, and if you have NULLS
in the data, you better be aware of the behavior. Granted, some people
are surprised by three-valued logic, but using a different behavior for
some new features would just increase the confusion.

> A different perspective is to look at what use-cases we can foresee.
>
> I've been trying hard, but I can't find compelling use-cases where a NULL element
> in a set would offer a more natural SQL query than handling NULLs within SQL and
> keeping the set NULL-free.
>

IMO if you have NULL values in the data, you better be aware of it and
handle the case accordingly (e.g. by filtering them out when building
the set). If you don't have NULLs in the data, there's no issue.

And in the graph case, I don't see why you'd have any NULLs, considering
we're dealing with adjacent nodes, and if there's adjacent node, it's ID
is not NULL.

> Does anyone else have a strong realistic example where including NULLs in the
> set would simplify the SQL query?
>

I'm sure there are cases where you have NULLs in the dat aand need to
filter them out, but that's just natural consequence of having NULLs. If
you have them you better know what NULLs do ...

It's too early to make any strong statements, but it's going to be hard
to convince me we should handle NULLs differently from what we already
do elsewhere.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-06-20 16:54:56 Re: Why does pg_bsd_indent need to be installed?
Previous Message Joel Jacobson 2023-06-20 16:20:33 Re: Do we want a hashset type?