Re: Reduce palloc's in numeric operations.

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reduce palloc's in numeric operations.
Date: 2012-09-19 12:20:10
Message-ID: 5059B87A.2070305@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14.09.2012 11:25, Kyotaro HORIGUCHI wrote:
> Hello, I will propose reduce palloc's in numeric operations.
>
> The numeric operations are slow by nature, but usually it is not
> a problem for on-disk operations. Altough the slowdown is
> enhanced on on-memory operations.
>
> I inspcted them and found some very short term pallocs. These
> palloc's are used for temporary storage for digits of unpaked
> numerics.
>
> The formats of numeric digits in packed and unpaked forms are
> same. So we can kicked out a part of palloc's using digits in
> packed numeric in-place to make unpakced one.
>
> In this patch, I added new function set_var_from_num_nocopy() to
> do this. And make use of it for operands which won't modified.

Have to be careful to really not modify the operands. numeric_out() and
numeric_out_sci() are wrong; they call get_str_from_var(), which
modifies the argument. Same with numeric_expr(): it passes the argument
to numericvar_to_double_no_overflow(), which passes it to
get_str_from_var(). numericvar_to_int8() also modifies its argument, so
all the functions that use that, directly or indirectly, must make a copy.

Perhaps get_str_from_var(), and the other functions that currently
scribble on the arguments, should be modified to not do so. They could
easily make a copy of the argument within the function. Then the callers
could safely use set_var_from_num_nocopy(). The performance would be the
same, you would have the same number of pallocs, but you would get rid
of the surprising argument-modifying behavior of those functions.

> The performance gain seems quite moderate....
>
> 'SELECT SUM(numeric_column) FROM on_memory_table' for ten million
> rows and about 8 digits numeric runs for 3480 ms aganst original
> 3930 ms. It's 11% gain. 'SELECT SUM(int_column) FROM
> on_memory_table' needed 1570 ms.
>
> Similary 8% gain for about 30 - 50 digits numeric. Performance of
> avg(numeric) made no gain in contrast.
>
> Do you think this worth doing?

Yes, I think this is worthwhile. I'm seeing an even bigger gain, with
smaller numerics. I created a table with this:

CREATE TABLE numtest AS SELECT a::numeric AS col FROM generate_series(1,
10000000) a;

And repeated this query with \timing:

SELECT SUM(col) FROM numtest;

The execution time of that query fell from about 5300 ms to 4300 ms, ie.
about 20%.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-09-19 12:32:03 Re: ToDo: allow to get a number of processed rows by COPY statement
Previous Message Shigeru HANADA 2012-09-19 09:51:13 Re: proposal - assign result of query to psql variable