Re: calling procedures is slow and consumes extra much memory against calling function

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: calling procedures is slow and consumes extra much memory against calling function
Date: 2020-07-16 19:08:09
Message-ID: CAFj8pRAS4fZS=-+uq+_0CXQBdKm1qDhGMF896F13brjgL46fmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

I am sending another patch that tries to allow CachedPlans for CALL
statements. I think this patch is very accurate, but it is not nice,
because it is smudging very precious reference counting for CachedPlans.

Current issue:
==========

I found a problem with repeated CALL statements from DO command. For every
execution of a CALL statement a plan is created that is released at the
time of the end of DO block.

create or replace procedure insert_into_foo(i int)
as $$
begin
insert into foo values(i, i || 'Ahoj');
if i % 1000 = 0 then raise notice '%', i;
--commit;
end if;
end;
$$ language plpgsql;

and DO

do $$
begin
for i in 1..500000
loop
call insert_into_foo(i);
end loop;
end
$$;

Requires about 2.5GB RAM (execution time is 18 sec). The problem is "long
transaction" with 500M iteration of CALL statement.

If I try to remove a comment before COMMIT - then I get 500 transactions.
But still it needs 2.5GB memory.

The reason for this behaviour is disabling plan cache for CALL statements
executed in atomic mode.

So I wrote patch 1, that releases the not saved plan immediately. This
patch is very simple, and fixes memory issues. It is a little bit faster
(14 sec), and Postgres consumes about 200KB.

Patch 1 is simple, clean, nice but execution of CALL statements is slow due
repeated planning.

I tried to fix this issue another way - by little bit different work with
plan cache reference counting. Current design expects only statements
wrapped inside transactions. It is not designed for new possibilities in
CALL statements, when more transactions can be finished inside one
statement. Principally - cached plans should not be reused in different
transactions (simple expressions are an exception). So if we try to use
cached plans for CALL statements, there is no clean responsibility who has
to close a cached plan. It can be SPI (when execution stays in the same
transaction), or resource owner (when transaction is finished inside
execution of SPI).

The Peter wrote a comment about it

<--><--><-->/*
<--><--><--> * Don't save the plan if not in atomic context. Otherwise,
<--><--><--> * transaction ends would cause errors about plancache leaks.
<--><--><--> *

This comment is not fully accurate. If we try to save the plan, then
execution (with finished transaction inside) ends by segfault. Cached plan
is released on transaction end (by resource owner) and related memory
context is released. But next time this structure is accessed. There is
only a warning about unclosed plan cache (it maybe depends on other things).

I wrote a patch 2 that marks CALL statement related plans as "fragile". In
this case the plan is cached every time. There is a special mark "fragile",
that blocks immediately releasing related memory context, and it blocks
warnings and errors because for this case we expect closing plan cache by
resource owner or by SPI statement.

It reduces well CPU and memory overhead - execution time (in one big
transaction is only 8sec) - memory overhead is +/- 0

Patch2 is not too clear, and too readable although I think so it is more
correct. It better fixes SPI behaviour against new state - possibility to
commit, rollback inside procedures (inside SPI call).

All regress tests passed.

Regards

Pavel

Attachment Content-Type Size
plpgsql-stmt_call-fix-2.patch text/x-patch 7.7 KB
plpgsql-stmt_call-fix-1.patch text/x-patch 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-16 19:29:45 NaN divided by zero should yield NaN
Previous Message Tom Lane 2020-07-16 18:58:13 Wrong results from in_range() tests with infinite offset