Re: Missing free_var() at end of accum_sum_final()?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Dean Rasheed" <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: "Michael Paquier" <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, "Andres Freund" <andres(at)anarazel(dot)de>
Subject: Re: Missing free_var() at end of accum_sum_final()?
Date: 2023-02-20 22:16:54
Message-ID: 2432830b-f63b-4cb4-a744-2e12503621b1@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

My apologies, it seems my email didn't reach the list, probably due to the
benchmark plot images being to large. Here is the email again, but with
URLs to the images instead, and benchmark updated with results for the
0005-fixed-buf.patch.

--

On Mon, Feb 20, 2023, at 12:32, Dean Rasheed wrote:
> TBH, I'm yet to be convinced that any of this is actually worthwhile.
> We might shave a few percent off some simple numeric operations, but I
> doubt it will make much difference to more complex computations. I'd
> need to see some more realistic test results, or some real evidence of
> palloc/pfree causing significant overhead in a numeric computation.

Thanks for testing! Good point, I agree with your conclusion;
the small change to alloc_var() suggested in 0003-fixed-buf.patch probably
didn't provide enough speedups to motivate the change.

In the new attached patch, Andres fixed buffer idea has been implemented
throughout the entire numeric.c code base.

CHANGES:
--------

* Instead of a bool, we now have a buf_len struct field, to keep track of the
allocated capacity, which can be different from the numbers of digits
currently used. buf_len == 0 indicates no memory is allocated, and fixed_buf
is possibly used instead.

* alloc_var() now reuses the allocated buffer if there is one big enough.

* free_var() and zero_var() only call digitbuf_free() if they need to.

* set_var_from_var() now also uses the fixed buffer and also reuses the
allocated buffer if there is one and it's big enough.

* To allow the NumericVar's on the stack to be used without having to allocate
digits buf, we have to change callers e.g. add_var(), div_var(), etc, to ensure
the result variable isn't the same as one of the operand NumericVars.
This wasn't necessary before, since we always allocated a new digits buf for the
result, which could then be assigned to the digits field when done with
computations. However, this prevented us from relying solely on the existing
NumericVar stack variables, so we needed a few new temporary NumericVar stack
variables, to hold intermediate results, and set_var_from_var() to copy the
operand into the temp var.

Assert()'s have been added to such functions, add_abs(), sub_abs(), div_var(),
div_var_fast(), div_var_int64() and div_var_int64(), that enforce result being
a different object than the two operands.

Here is one example from ceil_var() on this from the new 0004-fixed-buf.patch:

HEAD:

if (var->sign == NUMERIC_POS && cmp_var(var, &tmp) != 0)
add_var(&tmp, &const_one, &tmp);
set_var_from_var(&tmp, result);

The add_var() is a problem since &tmp is both the first operand and the result.
Funnily enough, the fix in this particular case, and in floor_var(),
is simpler and should be faster:

0005-fixed-buf.patch:

if (var->sign == NUMERIC_POS && cmp_var(var, &tmp) != 0)
add_var(&tmp, &const_one, result);
else
set_var_from_var(&tmp, result);

This is avoids the set_var_from_var() if the if-branch is taken, as the result
is written directly to result.

Another example from sqrt_var():

HEAD:

add_var(&q_var, &a1_var, &q_var);

0005-fixed-buf.patch:

set_var_from_var(&q_var, &tmp_var);
add_var(&tmp_var, &a1_var, &q_var);

The extra set_var_from_var() seems to be a net-win in many cases,
except for the generate_series() example which doesn't have any
NumericVar's on the stack, except for the first iteration.

BENCHMARK:
----------

SELECT max(a + b + '17'::numeric + c)
FROM
(SELECT generate_series(1::numeric, 1000::numeric)) aa(a),
(SELECT generate_series(1::numeric, 100::numeric)) bb(b),
(SELECT generate_series(1::numeric, 10::numeric)) cc(c);

HEAD:

Time: 158.698 ms
Time: 142.486 ms
Time: 141.443 ms
Time: 142.044 ms
Time: 141.651 ms

0005-fixed-buf.patch:

Time: 156.371 ms
Time: 149.857 ms
Time: 150.674 ms
Time: 150.409 ms
Time: 150.461 ms

SELECT setseed(0.1234);
CREATE TABLE t (n1 numeric, n2 numeric, n3 numeric, n4 numeric);
INSERT INTO t (n1, n2, n3, n4)
SELECT
round(random()::numeric,2),
round(random()::numeric*10,2),
round(random()::numeric*100,2),
round(random()::numeric*1000,2)
FROM generate_series(1,1e7);
CHECKPOINT;
SELECT SUM(n1+n2+n3+n4) FROM t;

HEAD:

Time: 758.489 ms
Time: 646.794 ms
Time: 643.237 ms
Time: 642.620 ms
Time: 646.218 ms

0005-fixed-buf.patch:

Time: 748.093 ms
Time: 628.799 ms
Time: 629.853 ms
Time: 629.166 ms
Time: 627.768 ms

CREATE TABLE ledger (amount numeric);
INSERT INTO ledger (amount)
SELECT generate_series(-100000.00,100000,0.01);
CHECKPOINT;
SELECT SUM(amount*1.25 + 0.5) FROM ledger;

HEAD:

Time: 1113.080 ms (00:01.113)
Time: 931.998 ms
Time: 931.009 ms
Time: 932.476 ms
Time: 933.509 ms

0005-fixed-buf.patch:

Time: 1067.298 ms (00:01.067)
Time: 883.972 ms
Time: 880.165 ms
Time: 882.465 ms
Time: 893.646 ms

CREATE TEMP TABLE foo(c1 numeric, c2 numeric, c3 numeric, c4 numeric,
c5 numeric, c6 numeric, c7 numeric, c8 numeric,
c9 numeric, c10 numeric);

INSERT INTO foo
SELECT trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4)),
trim_scale(round(random()::numeric*1e4, 4))
FROM generate_series(1,10000000);
COPY foo TO '/tmp/random-numerics.csv';

TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';

HEAD:

Time: 8515.644 ms (00:08.516)
Time: 8405.150 ms (00:08.405)
Time: 8399.067 ms (00:08.399)
Time: 8678.949 ms (00:08.679)
Time: 8388.152 ms (00:08.388)

0005-fixed-buf.patch:

Time: 8255.290 ms (00:08.255)
Time: 7986.409 ms (00:07.986)
Time: 8005.748 ms (00:08.006)
Time: 8004.352 ms (00:08.004)
Time: 8160.537 ms (00:08.161)

FIXED_BUF_LEN:
--------------

In 0005-fixed-buf.patch, the new FIXED_BUF_LEN def has been set to 8.

I've benchmarked other values as well, 2, 4, 16, 32, but 8 seems to be the sweet
spot. div_var() would benefit from FIXED_BUF_LEN 16 though.

Here is the test results using different FIXED_BUF_LEN values.

Pardon the images, but it's difficult to textify the two dimensional results,
as we need to vary the digit length of both operands.

This first image shows the execution time for numeric_add() with
operands up to 20 decdigits:

https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-add.pdf.png

The color scale is execution_time, where more reddish/hotter means slower,
and more blueish/cooler means faster.

As we can see, it gets notably cooler already with FIXED_BUF_LEN 4,
up to 8 decdigits, but it also gets a bit hotter for larger numbers.
With FIXED_BUF_LEN 8, it's cooler both for small numbers,
but it's also cooler for larger numbers.

If instead looking at numeric_div(),
we can see how FIXED_BUF_LEN 16 would be an improvement:

https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-div.pdf.png

The plot for numeric_mul() is a bit more difficult to read,
but we can see some improvement already at FIXED_BUF_LEN 4:

https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-mul.pdf.png

Note the scales are different for all these three plots.

In the plot below, the scale is the same for all three operators,
which can be nice to understand their relative execution time:

https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-overview.pdf.png

In these plots we have only studied operands with up to 20 decdigits.

For completion, here is plots up to 200 decdigits:

https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-200-digits-overview.pdf.png

As we can see, there is no significant observable difference, as expected,
since the fixed buffer only improves moderately big numbers.

And finally, here is a plot of up to 131072 decdigits:

https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/full-range-overview.pdf.png

Some additional plots can be viewed at the end of this gist:
https://gist.github.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4

It would be nice to avoid the additional tmp vars, as it glutters the interface.
One idea maybe could be to use an additional struct field for the writing of the result.
Or at least add helper-functions to avoid an extra line of code for the affected places in the code.

/Joel

Attachment Content-Type Size
0005-fixed-buf.patch application/octet-stream 28.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-20 22:30:56 Re: Silent overflow of interval type
Previous Message Peter Smith 2023-02-20 22:01:10 Re: Time delayed LR (WAS Re: logical replication restrictions)