Re: Re: Abbreviated keys for Numeric

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Re: Abbreviated keys for Numeric
Date: 2015-02-01 00:07:23
Message-ID: CAM3SWZQqChYW-tDVLrFN2ww8tZXceeqtY=KuJvHuT5cdUo950g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 26, 2015 at 3:35 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I am not seriously suggesting pursuing abbreviation for float8 in the
> near term - numeric is clearly what we should concentrate on. It's
> interesting that abbreviation of float8 could potentially make sense,
> though.

Note that in the IEEE 754 standard, the exponent does not have a sign.
Rather, an exponent bias is subtracted from it (127 for single
precision floats, and 1023 for double precision floats). This, and the
bit sequence of the mantissa allows floats to be compared and sorted
correctly even when interpreting them as integers. The exception is
NaN, but then we have an exception to that exception.

This is a really old idea, actually. I first saw it in a paper written
in the 1960s, long before math coprocessors became standard. Haven't
really thrashed this out enough, but I offhand I guess it would work.

The other problem is that positive IEEE floating-point numbers sort
like integers with the same bits, and negative IEEE floating-point
numbers sort in the reverse order of integers with the same bits. So
we'd probably end up with an encoding scheme that accounted for that,
and forget about tie-breakers (or have a NOOP "return 0" tie-breaker).
An example of the problem:

postgres=# create table foo (a float8);
CREATE TABLE
postgres=# insert into foo values (1), (2), (3), (-1), (-2), (-3);
INSERT 0 6
postgres=# select * from foo order by a;
a
----
-1
-2
-3
1
2
3
(6 rows)

The reason that this conversion usually doesn't occur in library
sorting routines is because it only helps significantly on x86, has
additional memory overhead, and ordinarily requires that we convert
back when we're done sorting. The costs/benefit analysis for tuplesort
would be much more favorable than a generic float sorting case, given
that we pretty much have datum1 storage as a sunk costs anyway, and
given that we don't need to convert back the datum1 representation,
and given that the encoding process would be dirt cheap and occur at a
time when we were likely totally bottlenecked on memory bandwidth.

I don't want to get bogged down on this - the numeric abbreviation
patch *is* still much more compelling - but maybe abbreviation of
float8 isn't a red herring after all.
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-02-01 00:23:37 Re: parallel mode and parallel contexts
Previous Message Marco Nenciarini 2015-01-31 23:47:24 File based Incremental backup v9