Re: [PATCH] Support empty ranges with bounds information

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: [PATCH] Support empty ranges with bounds information
Date: 2021-03-02 20:40:17
Message-ID: 1C23E6FC-9EDD-43DC-8C00-A753236F5F5D@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mar 2, 2021, at 12:04 PM, Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> On Tue, Mar 2, 2021, at 20:57, Mark Dilger wrote:
>> I didn't phrase that clearly enough. I'm thinking about whether you include the bounds information in the hash function. The current implementation of hash_range(PG_FUNCTION_ARGS) is going to hash the lower and upper bounds, since you didn't change it to do otherwise, so "equal" values won't always hash the same. I haven't tested this out, but it seems you could get a different set of rows depending on whether the planner selects a hash join.
>
> I think this issue is solved by the empty-ranges-with-bounds-information-v2.patch I just sent,
> since with it, there are no semantic changes at all, lower() and upper() works like before.

There are semantic differences, because hash_range() doesn't call lower() and upper(), it uses RANGE_HAS_LBOUND and RANGE_HAS_UBOUND, the behavior of which you have changed. I created a regression test and expected results and checked after applying your patch, and your patch breaks the hash function behavior. Notice that before your patch, all three ranges hashed to the same value, but not so after:

@@ -1,18 +1,18 @@
select hash_range('[a,a)'::textrange);
hash_range
------------
- 484847245
+ -590102690
(1 row)

select hash_range('[b,b)'::textrange);
hash_range
------------
- 484847245
+ 281562732
(1 row)

select hash_range('[c,c)'::textrange);
- hash_range
-------------
- 484847245
+ hash_range
+-------------
+ -1887445565
(1 row)

You might change how hash_range() works to get all "equal" values to hash the same value, but that just gets back to the problem that non-equal things appear to be equal. I guess that's your two-warty-feet preference, but not everyone is going to be in agreement on that.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2021-03-02 20:51:29 Re: [PATCH] Support empty ranges with bounds information
Previous Message Robert Haas 2021-03-02 20:39:29 Re: new heapcheck contrib module