Re: record identical operator

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-14 18:58:32
Message-ID: 20130914185832.GA2291@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
> > what I am talking about is that
> > e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3];
> > obviously should be true.
>
> The patch does not change the behavior of the = operator for any
> type under any circumstances.

Yes, sure. I wasn't thinking you would.

> > But both arrays don't have the same binary representation since
> > the former has a null bitmap, the latter not. So, if you had a
> > composite type like (int4[]) and would compare that without
> > invoking operators you'd return something false in some cases
> > because of the null bitmaps.
>
> Not for the = operator.  The new "identical" operator would find
> them to not be identical, though.

Yep. And I think that's a problem if exposed to SQL. People won't
understand the hazards and end up using it because its faster or
somesuch.

> Since the new operator is only for the record type, I need to wrap
> the values in your example:

Yes.

> The REFRESH causes them to match again, and later REFRESH runs
> won't see a need to do any work there unless the on-disk
> representation changes again.

Yes, I understand that the matview code itself will just perform
superflous work. We use such comparisons in other parts of the code
similarly.

> As far as I can see, we have four choices:
>
> (1)  Never update values that are "equal", even if they appear
> different to the users, as was demonstrated with the citext
> example.

I think, introducing a noticeable amount of infrastructure for this just
because of citext is a bad idea.
At some point we need to replace citext with proper case-insensitive
collation support - then it really might become necessary.

> (2)  Require every data type which can be used in a matview to
> implement some new operator or function for "identical".  Perhaps
> that could be mitigated to only implementat it if equal values can
> have user-visible differences.

That basically would require adding a new member to btree opclasses that
btrees don't need themselves... Hm.

> (3)  Embed special cases into record identical tests for types
> known to allow multiple on-disk representations which have no
> user-visible differences.

I think this is a complete nogo. a) I don't forsee we know of all these
cases b) it wouldn't be extensible.

Oh. Now that I've read further, I see you feel the same. Good ;)

> (4)  Base the need to update a matview column on whether its
> on-disk representation is identical to what a new run of the
> defining query would generate.  If this causes performance problems
> for use of a given type in a matview, one possible solution would
> be to modify that particular type to use a canonical format when
> storing a value into a record.  For example, storing an array which
> has a bitmap of null values even though there are no nulls in the
> array could strip the bitmap as it is stored to the record.

If matview refreshs weren't using plain SQL and thus wouldn't require
exposing that operator to SQL I wouldn't have a problem with this...

There's the ungodly ugly choice of having an matview_equal function (or
operator) that checks if we're doing a refresh atm...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-09-14 19:03:52 Re: git apply vs patch -p1
Previous Message Marko Tiikkaja 2013-09-14 18:52:40 Re: Assertions in PL/PgSQL