In PG12, query with float calculations is slower than PG11

From: keisuke kuroda <keisuke(dot)kuroda(dot)3862(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: In PG12, query with float calculations is slower than PG11
Date: 2020-02-06 05:25:03
Message-ID: CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I am testing performance both PG12 and PG11.
I found the case of performance degradation in PG12.

Amit Langote help me to analyze and to create patch.
Thanks!

* environment

CentOS Linux release 7.6.1810 (Core)
postgresql 12.1
postgresql 11.6

* postgresql.conf

shared_buffers = 2048MB
max_parallel_workers_per_gather = 0
work_mem = '64MB'
jit = off

* test case

CREATE TABLE realtest(a real, b real, c real, d real, e real);
INSERT INTO realtest SELECT i,i,i,i,i FROM generate_series(0,10000000) AS i;

EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;

* result

PG12.1 5878.389 ms
PG11.6 4533.554 ms

** PostgreSQL 12.1

pgbench=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40)
(actual time=0.040..5195.328 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b),
('2'::double precision * c), ('2'::double precision * d), ('2'::double
precision * e)
Buffers: shared hit=63695
Planning Time: 0.051 ms
Execution Time: 5878.389 ms
(5 行)

Samples: 6K of event 'cpu-clock', Event count (approx.): 1577750000
Overhead Command Shared Object Symbol
25.48% postgres postgres [.] ExecInterpExpr
★18.65% postgres libc-2.17.so [.] __isinf
14.36% postgres postgres [.] float84mul
8.54% postgres [vdso] [.] __vdso_clock_gettime
4.02% postgres postgres [.] ExecScan
3.69% postgres postgres [.] tts_buffer_heap_getsomeattrs
2.63% postgres libc-2.17.so [.] __clock_gettime
2.55% postgres postgres [.] HeapTupleSatisfiesVisibility
2.00% postgres postgres [.] heapgettup_pagemode

** PostgreSQL 11.6

pgbench=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40)
(actual time=0.012..3845.480 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b),
('2'::double precision * c), ('2'::double precision * d), ('2'::double
precision * e)
Buffers: shared hit=63695
Planning Time: 0.033 ms
Execution Time: 4533.554 ms
(5 行)

Samples: 4K of event 'cpu-clock', Event count (approx.): 1192000000
Overhead Command Shared Object Symbol
32.30% postgres postgres [.] ExecInterpExpr
14.95% postgres postgres [.] float84mul
10.57% postgres [vdso] [.] __vdso_clock_gettime
★ 6.84% postgres libc-2.17.so [.] __isinf
3.96% postgres postgres [.] ExecScan
3.50% postgres libc-2.17.so [.] __clock_gettime
3.31% postgres postgres [.] heap_getnext
3.08% postgres postgres [.] HeapTupleSatisfiesMVCC
2.77% postgres postgres [.] slot_deform_tuple
2.37% postgres postgres [.] ExecProcNodeInstr
2.08% postgres postgres [.] standard_ExecutorRun

* cause

Obviously, even in common cases where no overflow occurs,
you can tell that PG 12 is performing isinf() 3 times on every call of
float8_mul() once for each of val1, val2, result where as PG 11
is performing only once for result.

That's because check_float8_val() (in PG 12) is a function
whose arguments must be evaluated before
it is called (it is inline, but that's irrelevant),
whereas CHECKFLOATVAL() (in PG11) is a macro
whose arguments are only substituted into its body.

By the way, this change of float8mul() implementation is
mostly due to the following commit in PG 12 development cycle:
commit 6bf0bc842bd75877e31727eb559c6a69e237f831

Especially, the following diff:

@@ -894,13 +746,8 @@ float8mul(PG_FUNCTION_ARGS) {
float8 arg1 = PG_GETARG_FLOAT8(0);
float8 arg2 = PG_GETARG_FLOAT8(1);
- float8 result;
-
- result = arg1 * arg2;

- CHECKFLOATVAL(result, isinf(arg1) || isinf(arg2),
- arg1 == 0 || arg2 == 0);
- PG_RETURN_FLOAT8(result);
+ PG_RETURN_FLOAT8(float8_mul(arg1, arg2));
}

* patch

This patch uses MACRO which was used by PG11.
I tried attached patch, which can be applied to PG 12 source and performed
a benchmark:

PG12.1 5878.389 ms
PG11.6 4533.554 ms

PG12.1 + Patch 4679.162 ms

** PostgreSQL 12.1 + Patch

postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..307328.38 rows=10828150 width=40)
(actual time=0.012..4009.012 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b),
('2'::double precision * c), ('2'::double precision * d), ('2'::double
precision * e)
Buffers: shared hit=63695
Planning Time: 0.038 ms
Execution Time: 4679.162 ms
(5 rows)

Samples: 5K of event 'cpu-clock', Event count (approx.): 1376750000
Overhead Command Shared Object Symbol
31.43% postgres postgres [.] ExecInterpExpr
14.24% postgres postgres [.] float84mul
10.40% postgres [vdso] [.] __vdso_clock_gettime
★ 5.41% postgres libc-2.17.so [.] __isinf
4.63% postgres postgres [.] tts_buffer_heap_getsomeattrs
4.03% postgres postgres [.] ExecScan
3.54% postgres libc-2.17.so [.] __clock_gettime
3.12% postgres postgres [.] HeapTupleSatisfiesVisibility
2.36% postgres postgres [.] heap_getnextslot
2.16% postgres postgres [.] heapgettup_pagemode
2.09% postgres postgres [.] standard_ExecutorRun
2.07% postgres postgres [.] SeqNext
2.03% postgres postgres [.] ExecProcNodeInstr
2.03% postgres postgres [.] tts_virtual_clear

PG 12 is still slower compared to PG 11, but the __isinf() situation is
better with the patch.

Best Regards,
Keisuke Kuroda

Attachment Content-Type Size
check-float-val.patch application/octet-stream 10.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-02-06 05:32:40 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Previous Message Andres Freund 2020-02-06 05:19:09 Re: Make ringbuffer threshold and ringbuffer sizes configurable?