[PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16
Date: 2023-01-22 13:41:01
Message-ID: b7a5893d-af18-4c0b-8918-96de5f1bbf39@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On platforms where we support 128bit integers, we could accelerate division
when the number of digits in the divisor is larger than 8 and less than or
equal to 16 digits, i.e. when the divisor that fits in a 64-bit integer but would
not fit in a 32-bit integer.

This patch adds div_var_int64(), which is similar to the existing div_var_int(),
but accepts a 64-bit divisor instead of a 32-bit divisor.

The new function is used within div_var() and div_var_fast().

To measure the effect, we need a volatile wrapper function for numeric_div(),
to avoid it being cached since it's immutable:

CREATE OR REPLACE FUNCTION numeric_div_volatile(numeric,numeric)
RETURNS numeric LANGUAGE internal AS 'numeric_div';

We can then use generate_series() to measure the execution time for lots of
executions. This does not account for the overhead of generate_series() and
count(), but that's okay since the overhead is the same in both measurements,
so the relative difference is still correct.

--
-- Division when the divisor is 8 digits should be unchanged:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
repeat('1',131071)::numeric,
repeat('3',8)::numeric
)) FROM generate_series(1,1e4);
-- Execution Time: 1633.722 ms (HEAD)
-- Execution Time: 1680.228 ms (div_var_int64.patch)

--
-- Division when the divisor is 9 digits should be faster:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
repeat('1',131071)::numeric,
repeat('3',9)::numeric
)) FROM generate_series(1,1e4);
-- Execution Time: 5444.755 ms (HEAD)
-- Execution Time: 1604.967 ms (div_var_int64.patch)

--
-- Division when the divisor is 16 digits should also be faster:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
repeat('1',131071)::numeric,
repeat('3',16)::numeric
)) FROM generate_series(1,1e4);
-- Execution Time: 6072.683 ms (HEAD)
-- Execution Time: 3215.686 ms (div_var_int64.patch)

--
-- Division when the divisor is 17 digits should be unchanged:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
repeat('1',131071)::numeric,
repeat('3',17)::numeric
)) FROM generate_series(1,1e4);
-- Execution Time: 6948.150 ms (HEAD)
-- Execution Time: 7010.544 ms (div_var_int64.patch)

--
-- Same tests as above, but with a single digit dividend,
-- and 1e7 executions instead of just 1e4.
--

--
-- Division when the divisor is 8 digits should be unchanged:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
1,
repeat('3',8)::numeric
)) FROM generate_series(1,1e7);
-- Execution Time: 1827.567 ms (HEAD)
-- Execution Time: 1828.029 ms (div_var_int64.patch)

--
-- Division when the divisor is 9 digits should be faster:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
1,
repeat('3',9)::numeric
)) FROM generate_series(1,1e7);
-- Execution Time: 2314.851 ms (HEAD)
-- Execution Time: 1886.170 ms (div_var_int64.patch)

--
-- Division when the divisor is 16 digits should also be faster:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
1,
repeat('3',16)::numeric
)) FROM generate_series(1,1e7);
-- Execution Time: 2244.009 ms (HEAD)
-- Execution Time: 1968.148 ms (div_var_int64.patch)

--
-- Division when the divisor is 17 digits should be unchanged:
--
EXPLAIN ANALYZE
SELECT count(numeric_div_volatile(
1,
repeat('3',17)::numeric
)) FROM generate_series(1,1e7);
-- Execution Time: 2334.896 ms (HEAD)
-- Execution Time: 2338.141 ms (div_var_int64.patch)

The graph below shows the effect on execution time for numeric_div(),
and also looks at numeric_mod() since it's a heavy user of numeric_div().

The graph was produced by generating 100 random numeric integer values
for each combination of number of dividend/divisor digits between 1 to 20.

In total, that's 20*20*100*2=80000 test values.

As expected, the ceiling for the fast short division is lifted from 8 to 16 divisor digits,
and speedups for modulus is noticed in the same region.

The graph was produced using results from pg-timeit [1] and R for the plotting.

[1] https://github.com/joelonsql/pg-timeit

/Joel

Attachment Content-Type Size
div_var_int64.patch application/octet-stream 6.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl O. Pinc 2023-01-22 14:09:03 Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
Previous Message Takamichi Osumi (Fujitsu) 2023-01-22 12:42:00 RE: Time delayed LR (WAS Re: logical replication restrictions)