Re: record identical operator - Review

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator - Review
Date: 2013-09-26 21:50:08
Message-ID: 20130926215008.GB7113@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
> On 09/12/2013 06:27 PM, Kevin Grittner wrote:
> >Attached is a patch for a bit of infrastructure I believe to be
> >necessary for correct behavior of REFRESH MATERIALIZED VIEW
> >CONCURRENTLY as well as incremental maintenance of matviews.
>
> Here is attempt at a review of the v1 patch.
>
> There has been a heated discussion on list about if we even want
> this type of operator. I will try to summarize it here
>
> The arguments against it are
> * that a record_image_identical call on two records that returns
> false can still return true under the equality operator, and the
> records might (or might not) appear to be the same to users
> * Once we expose this as an operator via SQL someone will find it
> and use it and then complain when we change things such as the
> internal representation of a datatype which might
> break there queries
> * The differences between = and record_image_identical might not be
> understood by everywhere who finds the operator leading to confusion
> * Using a criteria other than equality (the = operator provided by
> the datatype) might cause problems if we later provide 'on change'
> triggers for materialized views
>
> The arguments for this patch are
> * We want the materialized view to return the same value as would be
> returned if the query were executed directly. This not only means
> that the values should be the same according to a datatypes =
> operator but that they should appear the same 'to the eyeball'.
> * Supporting the materialized view refresh check via SQL makes a lot
> of sense but doing that requires exposing something via SQL
> * If we adequately document what we mean by record_image_identical
> and the operator we pick for this then users shouldn't be surprised
> at what they get if they use this

This is a good summary. I think there are a few things that make this
issue difficult to decide:

1. We have to use an operator to give the RMVC (REFRESH MATERIALIZED
VIEW CONCURRENTLY) SPI query the ability to optimize this query. If we
could do this with an SQL or C function, there would be less concern
about the confusion caused by adding this capability.

Question: If we are comparing based on some primary key, why do we need
this to optimize? Certainly the primary key index will be used, no?

2. Agregates are already non-deterministic, so some feel that adding
this feature doesn't improve much. The counter-argument is that without
the binary row comparison ability, rows could be returned that could
_never_ have been produced by the base data, which is more of a user
surprise than non-deterministic rows.

3. Our type casting and operators are already complex, and adding
another set of operators only compounds that.

4. There are legitimate cases where tool makers might want the ability
to compare rows binarily, e.g. for replication, and adding these
operators would help with that.

I think we need to see a patch from Kevin that shows all the row
comparisons documented and we can see the impact of the user-visibile
part.

One interesting approach would be to only allow the operator to be
called from SPI queries.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2013-09-26 21:56:29 Re: GIN improvements part 1: additional information
Previous Message Robert Haas 2013-09-26 21:18:43 Re: FW: REVIEW: Allow formatting in log_line_prefix