Re: record identical operator

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-26 15:18:21
Message-ID: CA+TgmoascbWNupoS9izktO8fQSpKe3G0n=afAV1Fg1Qf1p0RDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 24, 2013 at 3:22 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> Now I admit that's an arguable point. We could certainly define an
>> intermediate notion of equality that is more equal than whatever =
>> does, but not as equal as exact binary equality.
>
> I suggested it up-thread and don't recall seeing a response, so here it
> is again- passing the data through the binary-out function for the type
> and comparing *that* would allow us to change the interal binary
> representation of data types and would be something which we could at
> least explain to users as to why X isn't the same as Y according to this
> binary operator.

Sorry, I missed that suggestion.

I'm wary of inventing a completely new way of doing this. I don't
think that there's any guarantee that the send/recv functions won't
expose exactly the same implementation details as a direct check for
binary equality. For example, array_send() seems to directly reveal
the presence or absence of a NULL bitmap. Even if there were no such
anomalies today, it feels fragile to rely on a fairly-unrelated
concept to have exactly the semantics we want here, and it will surely
be much slower. Binary equality has existing precedent and is used in
numerous places in the code for good reason. Users might be confused
about the use of those semantics in those places also, but AFAICT
nobody is.

I think the best thing I can say about this proposal is that it would
clearly be better than what we're doing now, which is just plain
wrong. I don't think it's the best proposal, however.

>> It is obviously true, and unavoidable, that if the query can produce
>> more than one result set depending on the query plan or other factors,
>> then the materialized view contents can match only one of those
>> possible result sets. But you are arguing that it should be allowed
>> to produce some OTHER result set, that a direct execution of the query
>> could *never* have produced, and I can't see how that can be right.
>
> I agree that the matview shouldn't produce things which *can't* exist
> through an execution of the query. I don't intend to argue against that
> but I realize that's a fallout of not accepting the patch to implement
> the binary comparison operators. My complaint is more generally that if
> this approach needs such then there's going to be problems down the
> road. No, I can't predict exactly what they are and perhaps I'm all wet
> here, but this kind of binary-equality operations are something I've
> tried to avoid since discovering what happens when you try to compare
> a couple of floating point numbers.

So, I get that.

But what I think is that the problem that's coming up here is almost
the flip side of that. If you are working with types that are a
little bit imprecise, such as floats or citext or box, you want to use
comparison strategies that have a little bit of tolerance for error.
In the case of box, this is actually built into the comparison
operator. In the case of floats, it's not; you as the application
programmer have to deal with the fact that comparisons are imprecise -
like by avoiding equality comparisons.

On the other hand, if you are *replicating* those data types, then you
don't want that tolerance. If you're checking whether two boxes are
equal, you may indeed want the small amount of fuzziness that our
comparison operators allow. But if you're copying a box or a float
from one table to another, or from one database to another, you want
the values copied exactly, including all of those low-order bits that
tend to foul up your comparisons. That's why float8out() normally
doesn't display any extra_float_digits - because you as the user
shouldn't be relying on them - but pg_dump does back them up because
not doing so would allow errors to propagate. Similarly here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2013-09-26 15:39:05 Re: [PATCH] bitmap indexes
Previous Message Noah Misch 2013-09-26 13:50:28 Re: pgbench filler columns