Re: Rethinking MemoryContext creation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Rethinking MemoryContext creation
Date: 2017-12-13 02:05:00
Message-ID: 324.1513130700@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> On 2017-12-12 14:50:37 -0500, Robert Haas wrote:
>>> Another idea I have is that perhaps we could arrange to reuse contexts
>>> instead of destroying them and recreating them.

>> I'm a bit doubtful that's going to help, maintaining that list isn't
>> going to be free, and the lifetime and number of those contexts aren't
>> always going to match up.

> Actually, this seems like a really promising idea to me. To the extent
> that an empty context has standard parameters, they're all
> interchangeable, so you could imagine that AllocSetDelete shoves it
> onto a freelist after resetting it, instead of just giving it back to
> libc. I'm slightly worried about creating allocation islands that
> way, but that problem is probably surmountable, if it's real at all.

> However, that seems like a different patch from what I'm working on
> here.

Oh, wait a minute. If we think of going that direction, then actually
what I did in v3 is counterproductive: we're better off making contexts
as much alike as possible, so that they can be recycled more easily.
So the attached goes back to v2's design where all aset.c contexts
are allocated together with their keeper block, and instead provides
context freelists more or less per Robert's suggestion. This proves
to be a bit faster than v3 on all cases I tried, and is likely to be
considerably better on cases that don't match v3's expectations about
which contexts will do what.

The details of the freelist management are ripe for bikeshedding, no doubt.
I set it up so that there are separate freelists for "default" and "small"
contexts, and we allow up to 100 contexts per freelist (so max space
wastage of less than 1MB). If a freelist gets too full, we simply release
everything in it back to libc, then add the context we were about to add.
The thought here is that a query that made 101 transient contexts probably
did not stop at 102; more likely, a bunch of additional deletions are
coming, so there's no value in sweating over exactly which contexts to
keep. Flushing all the previously-deleted contexts should help to reduce
the process memory map in the typical case where a lot of contexts get
released in roughly reverse order of allocation. There might be a better
way to do that --- I toyed with trying to release contexts with higher
memory addresses, for example. But I'm not sure it's worth spending a
lot of time on, unless someone can show a pathological case for this way.

I am pretty happy with this now and think it could be committed.
If anyone wants to research the freelist management issue some more,
they're welcome to, but I don't think that question needs to hold up
committing the basic improvement. (The hardest part would likely
be finding suitable test cases, anyway. Most queries don't get
anywhere near using 100 transient contexts.)

BTW, while I didn't do it here, I am very strongly tempted to replace
the test-and-elog validation of context size parameters with Asserts.
When we originally designed this code we thought people might be
calculating those parameters on the fly, but in nigh twenty years nobody
has ever done so, so the runtime cost doesn't really seem worth it.
I got some results while I was rearranging code that suggest that
skipping those tests could be worthwhile in practice, now that there's
a common short path through AllocSetContextCreate.

regards, tom lane

Attachment Content-Type Size
rethink-memory-context-creation-4.patch text/x-diff 71.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-12-13 02:10:48 Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL
Previous Message David G. Johnston 2017-12-13 01:57:17 Re: proposal: alternative psql commands quit and exit