Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "jian he" <jian(dot)universality(at)gmail(dot)com>
Cc: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "Tom Dunstan" <pgsql(at)tomd(dot)cc>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-15 21:05:38
Message-ID: 34881a04-d5ee-4b63-9548-374b350c87d6@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 15, 2023, at 06:29, jian he wrote:
> I am not sure the following results are correct.
> with cte as (
> select hashset(x) as x
> ,hashset_capacity(hashset(x))
> ,hashset_count(hashset(x))
> from generate_series(1,10) g(x))
> select *
> ,'|' as delim
> , hashset_add(x,11111::int)
> ,hashset_capacity(hashset_add(x,11111::int))
> ,hashset_count(hashset_add(x,11111::int))
> from cte \gx
>
>
> results:
> -[ RECORD 1 ]----+-----------------------------
> x | {8,1,10,3,9,4,6,2,11111,5,7}
> hashset_capacity | 64
> hashset_count | 10
> delim | |
> hashset_add | {8,1,10,3,9,4,6,2,11111,5,7}
> hashset_capacity | 64
> hashset_count | 11

Nice catch, you found a bug!

Fixed in attached patch:

---
Ensure hashset_add and hashset_merge operate on copied data

Previously, the hashset_add() and hashset_merge() functions were
modifying the original hashset in-place. This was leading to unexpected
results because the original data in the hashset was being altered.

This commit introduces the macro PG_GETARG_INT4HASHSET_COPY(), ensuring
a copy of the hashset is created and modified, leaving the original
hashset untouched.

This adjustment ensures hashset_add() and hashset_merge() operate
correctly on the copied hashset and prevent modification of the
original data.

A new regression test file `reported_bugs.sql` has been added to
validate the proper functionality of these changes. Future reported
bugs and their corresponding tests will also be added to this file.
---

I wonder if this function:

static int4hashset_t *
int4hashset_copy(int4hashset_t *src)
{
return src;
}

...that was previously named hashset_copy(),
should be implemented to actually copy the struct,
instead of just returning the input?

It is being used by int4hashset_agg_combine() like this:

/* copy the hashset into the right long-lived memory context */
oldcontext = MemoryContextSwitchTo(aggcontext);
src = int4hashset_copy(src);
MemoryContextSwitchTo(oldcontext);

/Joel

Attachment Content-Type Size
hashset-0.0.1-da84659.patch application/octet-stream 4.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-06-15 22:15:36 Re: subscription/033_run_as_table_owner is not listed in the meson.build
Previous Message Tom Lane 2023-06-15 20:36:33 Re: Memory leak in incremental sort re-scan