patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL
Date: 2020-11-02 12:08:19
Message-ID: CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

here is another patch related to using CALL statement inside PL/pgSQL code.

A repeated using of CALL statement is expensive. How much?

I wrote synthetic test:

CREATE TABLE foo(a int, b int, c int);

CREATE OR REPLACE PROCEDURE public.simple_proc3(a integer, b integer, c
integer, cnt int, OUT r boolean)
AS $$
BEGIN
INSERT INTO foo VALUES(a, b, c);
IF cnt % 10000 = 0 THEN
COMMIT;
r := true;
ELSE
r := false;
END IF;
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
TRUNCATE foo;
FOR i IN 1..10000000
LOOP
a := (random() * 100)::int;
b := (random() * 100)::int;
c := (random() * 100)::int;
CALL simple_proc3(a, b, c, i, r);
IF r THEN
RAISE NOTICE 'committed at % row', i;
END IF;
END LOOP;
END;
$$;

I try to insert 10M rows with commit after inserting 10K rows. Execution
time on master is ~ 6 minutes 368251,691 ms (06:08,252)

DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
TRUNCATE foo;
FOR i IN 1..10000000
LOOP
a := (random() * 100)::int;
b := (random() * 100)::int;
c := (random() * 100)::int;
INSERT INTO foo VALUES(a, b, c);
IF i % 10000 = 0 THEN
COMMIT;
r := true;
ELSE
r := false;
END IF;
IF r THEN
RAISE NOTICE 'committed at % row', i;
END IF;
END LOOP;
END;
$$;

When I try to remove CALL statement then same work needs less to 2 minutes
99109,511 ms (01:39,110). So this code is three times slower with calling
one procedure. There are two significant parts of overhead:

a) internal implementation of CALL statement that doesn't use plan cache
well, and it does lot of expensive operations over pg_proc catalogue,

b) wrapper in PL/pgSQL that repeatedly repearse expression string.

Overhead of PL/pgSQL can be reduced by using plan cache after fixing issue
with resource owner. I did it, and I introduced "local_resowner" for
holding references of plans for CALL statement expressions.

After patching the execution time is reduced to 4 minutes Time: 245852,846
ms (04:05,853). Still the overhead is significant, but it is 30% speedup.

The best case for this patch is about 2x times better performance

CREATE OR REPLACE PROCEDURE public.simple_proc2(a integer, b integer, c
integer, cnt int, OUT r boolean)
AS $$
BEGIN
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE a int; r boolean;
BEGIN
FOR i IN 1..10000000
LOOP
CALL simple_proc2((random()*100)::int, (random()*100)::int,
(random()*100)::int, i, r);
END LOOP;
END;
$$;

Time: 184667,970 ms (03:04,668), master: Time: 417463,457 ms (06:57,463)

On second hand, the worst case is about 10% (probably this overhead can be
reduced by creating "local_resowner" only when it is necessary)

CREATE OR REPLACE FUNCTION simple_fx2(a int)
RETURNS int AS $$
BEGIN
RETURN a + a;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

DO $$
DECLARE a int;
BEGIN
FOR i IN 1..10000000
LOOP
a := simple_fx2(i);
END LOOP;
END;
$$;

Time: 5434,808 ms (00:05,435) , master: Time: 4632,762 ms (00:04,633)

Comments, notes, ideas?

Regards

Pavel

Attachment Content-Type Size
plpgsql-using-local-resowner-for-call-plans.patch text/x-patch 31.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2020-11-02 12:24:12 Re: Split copy.c
Previous Message Heikki Linnakangas 2020-11-02 11:52:48 Re: hash_array_extended() needs to pass down collation