Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Date: 2019-12-24 12:29:23
Message-ID: 422df318-ac5d-3b7e-508e-2d5042e540e7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

19.12.2019 18:19, Antonin Houska wrote:
> Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>
>> I attached new version with pg_opclass documentation update.
>>
>> One more thing I am uncertain about  is array_ops. Arrays may contain bitwise
>> and not bitwise element types.
>> What is the correct value of opcisbitwise the array_ops itself?
> How about setting opcisbitwise to false for the array_ops opclass and checking
> opcisbitwise of the element type whenever we need to know whether the array is
> "bitwise equal"? When checking array_eq(), I thought whether the existence of
> "expanded array" format is a problem but it does not seem to be: the
> conversion of "expanded" value to "flat" value and then back to the "expanded"
> should not change the array contents.
>
> Anyway, in the current version of the patch I see that array_ops opclasses
> have opcisbitwise=true. It should be false even if you don't use the approach
> of checking the element type.
>
> Besides that, I think that record_ops is similar to array_ops and therefore it
> should not set opcisbitwise to true.
>
> I also remember that, when thinking about the problem in the context of the
> aggregate push down patch, I considered some of the geometric types
> problematic. For example, box_eq() uses this expression
>
> #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON)
>
> so equality does not imply bitwise equality here. Maybe you should only set
> the flag for btree opclasses for now.

Thank you for pointing out at the issue with geometric opclasses.
If I understand it correctly, regular float types are not bitwise as well.

I updated the patchset.
The first patch now contains only infrastructure changes
and the second one sets opcisbitwise for btree opclasses in pg_opclass.dat.

I've tried to be conservative and only mark types that are 100% bitwise
safe.
See attached v2-Opclass-isbitwise.out file.

Non-atomic types, such as record, range, json and enum depend on element
types.
Text can be considered bitwise (i.e. texteq uses memcmp) only when
specific collation clauses are satisfied.

We can make this 'opcisbitwise' parameter enum (or char) instead of
boolean to mark
"always bitwise", "never bitwise" and "maybe bitwise". Though, I doubt
if it will be helpful in any real use case.

What do you think?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v4-Opclass-bitwise-equality-0001.patch text/x-patch 13.0 KB
v4-Opclass-bitwise-equality-0002.patch text/x-patch 8.0 KB
v4-Opclass-isbitwise.out text/plain 796 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-12-24 13:22:22 Re: Should we rename amapi.h and amapi.c?
Previous Message Kyotaro Horiguchi 2019-12-24 12:26:14 Re: [HACKERS] Restricting maximum keep segments by repslots