Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From: Shachar Shemesh <shachar(at)shemesh(dot)biz>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: oledb-devel(at)pgfoundry(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Date: 2007-05-21 05:04:41
Message-ID: 46512869.6000108@shemesh.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:
> On Sun, 20 May 2007, Shachar Shemesh wrote:
>
>> This is not data given to store. It's data being exported.
>
> Data being exported has a funny way of turning around and being stored
> in the database again. It's kind of nice to know the damage done
> during that round trip is minimized.
I agree. All I'm asking, and have not received an answer yet, is whether
assuring that we don't have any SEMANTIC damage is enough.

In other words, if I can assure that data exported and then imported
will always, under all circumstances, compare the same to the original,
would that be enough of a requirement? In other words, if I offer a
format that is assured of preserving both mantissa and exponent
precision and range, as well as all extra attributes (+/-Infinity and
NaN), but does not guarantee that the semantically identical constructs
are told apart (+0 vs. -0, and the different NaNs), would that format be
acceptable?
>
>> Tom seems to think this is not a goal (though, aside from his disbelief
>> that such a goal is attainable, I have heard no arguments against it).
>
> If Tom thinks it's not attainable, the best way to convince him
> otherwise would be demonstrate that it's not.
Granted. That's why I've been quite. I'm pulling my sources for the ARM
FP format details, to make sure what I have in mind would work.
> One reason people use text formats for cross-platform exchanges is
> that getting portable binary compatibility for things like floating
> point numbers is much harder than you seem to think it is.
I'll just point out that none of the things that Tom seems to be
concerned about are preserved over text format.
>
> Stepping back for a second, your fundamental argument seem to be based
> on the idea that doing conversions to text is such a performance issue
> in a driver that it's worth going through these considerable
> contortions to avoid it.
Converting to text adds a CPU overhead in both client and server, as
well as a network transmission overhead. Even if it's not determental to
performance, I'm wondering why insist on paying it.

You are right that I offered no concrete implementation. I'll do it now,
but it is dependent on an important question - what is the range for the
ARM floating point. Not having either an ARM to test it on, nor the
floating point specs, it may be that a simpler implementation is
possible. I offer this implementation up because I see people think I'm
talking up my ass.

A 64 bit IEEE float can distinguish between almost all 2^64 distinct
floats. It loses two combinations for the + and - infinity, one
combination for the dual zero notation, and we also lose all of the
NaNs, which means (2^mantissa)-2 combinations. Over all, an n bit IEEE
float with m bits of mantissa will be able to represent 2^n - 2^m - 1
actual floating point numbers.

That means that if we take a general signed floating point number, of
which representation we know nothing but the fact it is n bits wide, and
that it has a mantissa and an exponent, and we want to encode it as an
IEEE number of the same width with mantissa size m and exponent of size
e=n-m-1, we will have at most 2^m+1 unrepresentable numbers.

In a nutshell, what I suggest is that we export floating points in
binary form in IEEE format, and add a status word to it. The status word
with dictate how many bits of mantissa there are in the IEEE format,
what the exponent bias is, as well as add between one and two bits to
the actual number, in case the number of floats the exported platform
has is larger than the number of floats that can be represented in IEEE
with the same word length.

The nice thing about this format is that exporting from an IEEE platform
is as easy as exporting the binary image of the float, plus a status
word that is a constant. Virtually no overhead. Importing from an IEEE
platform to an IEEE platform is, likewise, as easy as comparing the
status word to your own constant, and if they match, just copy the
binary. This maintains all of Tom's strict round trip requirements. In
fact, for export/import on the same IEEE platform no data conversion of
any kind takes place at all.

There are questions that need to be answered. For example, what happens
if you try to import a NaN into a platform that has no such concept?
You'd have to put in a NULL or something similar. Similarly, how do you
import Infinity. These, however, are questions that should be answered
the same way for text imports, so there is nothing binary specific here.

I hope that, at least, presents a workable plan. As I said before, I'm
waiting for the specs for ARM's floating point before I can move
forward. If, as I suspect, ARM's range is even more limited, then I may
try and suggest a more compact export representation pending question of
whether we have any other platform that is non-IEEE, and what is the
situation there.

Shachar

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message db 2007-05-21 05:58:04 Re: UTF8MatchText
Previous Message Greg Smith 2007-05-21 00:07:10 Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server