Re: [PATCH] Support empty ranges with bounds information

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Isaac Morland" <isaac(dot)morland(at)gmail(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>
Subject: Re: [PATCH] Support empty ranges with bounds information
Date: 2021-03-02 20:51:29
Message-ID: 7fae5633-5b27-4c87-bfdb-ac32549f2ae4@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 2, 2021, at 21:40, Mark Dilger wrote:
>
>
> > On Mar 2, 2021, at 12:04 PM, Joel Jacobson <joel(at)compiler(dot)org> wrote:
> >
> > On Tue, Mar 2, 2021, at 20:57, Mark Dilger wrote:
> >> I didn't phrase that clearly enough. I'm thinking about whether you include the bounds information in the hash function. The current implementation of hash_range(PG_FUNCTION_ARGS) is going to hash the lower and upper bounds, since you didn't change it to do otherwise, so "equal" values won't always hash the same. I haven't tested this out, but it seems you could get a different set of rows depending on whether the planner selects a hash join.
> >
> > I think this issue is solved by the empty-ranges-with-bounds-information-v2.patch I just sent,
> > since with it, there are no semantic changes at all, lower() and upper() works like before.
>
> There are semantic differences, because hash_range() doesn't call lower() and upper(), it uses RANGE_HAS_LBOUND and RANGE_HAS_UBOUND, the behavior of which you have changed. I created a regression test and expected results and checked after applying your patch, and your patch breaks the hash function behavior. Notice that before your patch, all three ranges hashed to the same value, but not so after:
>
>
> @@ -1,18 +1,18 @@
> select hash_range('[a,a)'::textrange);
> hash_range
> ------------
> - 484847245
> + -590102690
> (1 row)
>
> select hash_range('[b,b)'::textrange);
> hash_range
> ------------
> - 484847245
> + 281562732
> (1 row)
>
> select hash_range('[c,c)'::textrange);
> - hash_range
> -------------
> - 484847245
> + hash_range
> +-------------
> + -1887445565
> (1 row)
>
>
> You might change how hash_range() works to get all "equal" values to hash the same value, but that just gets back to the problem that non-equal things appear to be equal. I guess that's your two-warty-feet preference, but not everyone is going to be in agreement on that.

Yikes. Here be dragons. I think I want my wart free foot back please.

Many thanks for explaining. I think I’ll abandon this patch. I guess implementing an entirely new range type could be an acceptable solution, but that’s too big of a project for me to manage on my own. If any more experienced hackers are interested in such a project, I would love to help if I can.

>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
>

Kind regards,

Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-02 20:57:57 Re: buildfarm windows checks / tap tests on windows
Previous Message Mark Dilger 2021-03-02 20:40:17 Re: [PATCH] Support empty ranges with bounds information