Reducing memory consumption for pending inval messages

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Reducing memory consumption for pending inval messages
Date: 2021-05-30 17:22:56
Message-ID: 2380555.1622395376@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I got interested in $SUBJECT as a result of the thread at [1].
It turns out that the existing implementation in inval.c is quite
inefficient when a lot of individual commands each register just
a few invalidations --- but a few invalidations per command is
pretty typical. As an example, consider

DO $do$
BEGIN
FOR i IN 1..200000000 LOOP
execute 'create function foo' || i || '() returns int language sql as $$select 1$$';
if (i % 100000 = 0) then
raise notice '% loops done', i;
end if;
END LOOP;
END
$do$;

Each CREATE FUNCTION registers three invalidation events, which
minimally would require 48 bytes ... but the current code actually
eats about 2kB per iteration, because we allocate a pair of new
"chunks" for each command. The chunks themselves are intended
to hold 32 entries which'd take 512 bytes --- but there's some
overhead, causing aset.c to round up to 1024 bytes. Ouch.

It gets worse though. If you wrap the commands in subtransactions:

DO $do$
BEGIN
FOR i IN 1..200000000 LOOP
begin
execute 'create function foo' || i || '() returns int language sql as $$select 1$$';
if (i % 100000 = 0) then
raise notice '% loops done', i;
end if;
exception when division_by_zero then null;
end;
END LOOP;
END
$do$;

the space consumption balloons to about 8kB per iteration, because the
chunks are allocated in the per-subtransaction CurTransactionContext,
which is given 8kB right off the bat. In common cases this'll be the
*only* allocation in that context.

We can do a lot better, by exploiting what we know about the usage
patterns of invalidation requests. New requests are always added to
the latest sublist, and the only management actions are (1) merge
latest sublist into next-to-latest sublist, or (2) drop latest
sublist, if a subtransaction aborts. This means we could perfectly
well keep all the requests in a single, densely packed array in
TopTransactionContext, and replace the "list" control structures
with indexes into that array. The attached patch does that.

I don't see any particular speed differential with this (unsurprising,
since the other actions that an inval event logs and then triggers
will surely swamp inval.c's management overhead). But the space
consumption decreases gratifyingly.

There is one notable new assumption I had to make for this. At end
of a subtransaction, we have to merge its inval events into the
"PriorCmd" list of the parent subtransaction. (It has to be the
PriorCmd list, not the CurrentCmd list, because these events have
already been processed locally; we don't want to do that again.)
This means the parent's CurrentCmd list has to be empty at that
instant, else we'd be trying to merge sublists that aren't adjacent
in the array. As far as I can tell, this is always true: the patch's
check for it doesn't trigger in a check-world run. And there's an
argument that it must be true for semantic consistency (see comments
in patch). So if that check ever fails, it probably means there is a
missing CommandCounterIncrement somewhere. Still, this could use more
review and testing.

BTW, I noted with some amusement that this comment in
xactGetCommittedInvalidationMessages:

* ... Maintain the order that they
* would be processed in by AtEOXact_Inval(), to ensure emulated behaviour
* in redo is as similar as possible to original. We want the same bugs,
* if any, not new ones.

is making a claim that the existing code there actually does not
satisfy. In particular it fails to maintain the correct ordering of
catcache vs. relcache events. The patch fixes that, but I wonder
whether there is anything we need to do in the back branches. I'm
inclined to think that it doesn't matter beyond the small efficiency
risk inherent in doing (some) relcache flushes before catcache
flushes. The code already says that the order of events within any
one list isn't supposed to matter.

Anyway, I'll add this to the next CF.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/88986113-6b01-452b-89d0-9492b6a79e33%40www.fastmail.com

Attachment Content-Type Size
revamp-inval-list-management-1.patch text/x-diff 28.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-05-30 17:24:18 Re: list of extended statistics on psql (\dX)
Previous Message Omar Kilani 2021-05-30 16:23:32 Re: Clear empty space in a page.