Re: Bug in numeric multiplication

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in numeric multiplication
Date: 2015-11-18 22:19:43
Message-ID: 3056.1447885183@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I'm kind of stuck on that too. I did some experimentation by tracking
> maximum values of outercarry in the regression tests (including
> numeric_big) and did not see it get larger than a couple hundred thousand,
> ie more or less INT_MAX/NBASE. But I don't have an argument as to why
> that would be the limit.

A bit of progress on this: I've found a test case, namely

select sqrt(99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999.0099999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999);

If one inserts no-overflow Asserts into div_var_fast, this case will trip
them, specifically this:

/*
* The dividend digit we are about to replace might still be nonzero.
* Fold it into the next digit position. We don't need to worry about
* overflow here since this should nearly cancel with the subtraction
* of the divisor.
*/
+ Assert(Abs(div[qi]) <= INT_MAX/NBASE);
div[qi + 1] += div[qi] * NBASE;

Unfortunately, there isn't any SQL-visible misbehavior in this example,
because the loop in sqrt_var is more or less self-correcting for minor
errors (and this example produces bogus results only in very low-order
digits). Most of the other calls of div_var_fast give it inputs that are
even harder to control than sqrt_var's, so finding something that did
produce visibly wrong results might take a whole lot of trial and error.

Still, this proves that we are onto a real problem.

> Another issue here is that with outercarry added into the qdigit
> computation, it's no longer clear what the bounds of qdigit itself are,

I concluded that that particular issue is a red herring: qdigit should
always be a fairly accurate estimate of the next output digit, so it
cannot fall very far outside the range 0..NBASE-1. Testing confirms that
the range seen in the regression tests is exactly -1 to 10000, which is
what you'd expect since there can be some roundoff error.

Also, after further thought I've been able to construct an argument why
outercarry stays bounded. See what you think of the comments in the
attached patch, which incorporates your ideas about postponing the div[qi]
calculation.

regards, tom lane

Attachment Content-Type Size
div-var-fast-fix-again.patch text/x-diff 6.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-11-18 22:50:30 Re: Trivial heap_finish_speculative() error message inaccuracy
Previous Message Stephen Frost 2015-11-18 22:10:30 Re: Additional role attributes && superuser review