Re: POC: rational number type (fractions)

From: Joe Nelson <joe(at)begriffs(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: POC: rational number type (fractions)
Date: 2020-02-22 01:24:47
Message-ID: 20200222012447.GG42864@begriffs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:
> The decision between an extension and a core type is a tricky one. To
> put an extension in core, usually it's good to show either that it
> satisfies something in the SQL standard, or that there is some
> specific technical advantage (like integration with the syntax or the
> type system).

I don't see any references to "rational" in the SQL standard (fifth ed,
2016) and the only reference to "fraction" is for fractions of a second
in datetime. Doesn't look like SQL includes rational numbers.

Also I don't believe I'm getting extra abilities by putting this in core vs
using an extension. Perhaps there's a syntax change that would make rationals
more pleasant to deal with than how they in this patch, but I haven't imagined
what it would be, and it's not backed by a standard.

> Integrating it in core just to make it easier to use is a double-edge
> sword. It does make it easier in some environments; but it also
> removes pressure to make those environments offer better support for
> the extension ecosystem, ultimately weakening extensions overall.

Makes sense. We petitioned RDS to allow the pg_rational extension, [0]
but they didn't respond. Not sure how that process is supposed to work.

0: https://github.com/begriffs/pg_rational/issues/7

> In this case I believe it could be a candidate for in-core, but it's
> borderline. The reasons it makes sense to me are:
>
> 1. It seems that there's "only one way to do it".

There may be more than one way to do it actually. For instance the choice
between a fixed size type with limits on the fractions it can represent, vs one
that can grow to hold any fraction. I chose the first option, to make the type
the same size as float8. My reasoning was that there was would be no space
overhead for choosing rational over float.

Also there's the choice of whether to always store fractions in normal
form (lowest terms). This patch allows fractions to be in non-normal
form after arithmetic, and only normalizes as needed when an arithmetic
operation would overflow. I wanted to cut down on how many times gcd is
called. However this choice means that hash indices have to normalize
because they hash the bit pattern, while btree indices can compare
rationals without regard to normal form.

This patch represents each rational as a separate numerator and denominator. I
did some research today to see if there are any another ways, and looks like
there's a technique from the 70s called "quote notation." [1] It appears that
quote notation makes addition and subtraction faster, but that the operations
can less predictable performance in the worst case scenarios with doing
arbitrary precision. So there's more than one way to do it.

1: https://en.wikipedia.org/wiki/Quote_notation

> 2. I don't expect this will be much of a maintenance burden.

True, rational numbers aren't going to change anytime soon.

> Keep in mind that if you do want this to be in core, the data format
> has to be very stable to maintain pg_upgrade compatibility.

The format is currently [int32 numerator][int32 denominator] packed together,
where the denominator is made positive whenever possible (not possible when
it's -INT_MAX). The quote notation alternative would arrange things very
differently.

> Patch comments:
>
> * Please include docs.

Of course, if we determine the patch is desirable. The included tests
should help demonstrate how it works for the purposes of review.

> * I'm worried about the use of int32. Does that cover all of the
> reasonable use cases of rational?

I could imagine having two types, a rational8 for the current
implementation, and an arbitrary precision rational. Perhaps...

> * what's the philosophy regarding NULL and rational? Why are NULL
> arguments returning non-NULL answers?

The rational_intermediate(x, y) function picks a fraction between x and
y, and NULL was meant as a signal that one of the sides is unbounded.
So rational_intermediate(x, NULL) means find a fraction larger than x,
and rational_intermediate(NULL, y) means find one smaller than y.

The use case is a query for a spot "immediately after position 2:"

SELECT rational_intermediate(2, min(pos))
FROM todos
WHERE pos > 2;

If there are already todos positioned after 2 then it'll find a spot
between 2 and the min. However if there are no todos after 2 then min()
will return NULL and we'll simply find a position *somewhere* after 2.

> * Is rational_intermediate() well-defined, or can it just choose any
> rational between the two arguments?

It chooses the mediant [2] of x and y in lowest terms by walking a
Stern-Brocot tree. I found that this keeps the terms of the fraction
much smaller than taking the average of x and y. This was an advantage
over calculating with floats because I don't know how to take the
mediant of floats, and repeatedly taking the average of floats eats up
precision pretty quickly.

2: https://en.wikipedia.org/wiki/Mediant_(mathematics)

> * Can you discuss how cross-type comparisons and conversions should be
> handled (e.g. int8, numeric, float8)?

Good point, I don't have tests for that. Would implicit casts do the
trick? So '1/2'::rational < 1 would cast 1 to '1/1' and compare? I have
currently included these casts: integer -> rational, float8 <->
rational. Don't have one for numeric yet.

> Regards,

Thank you for taking the time to raise those questions.

--
Joe Nelson https://begriffs.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-02-22 02:14:58 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Michael Leonhard 2020-02-22 01:08:47 Make java client lib accept same connection strings as psql