Re: Modernizing our GUC infrastructure

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Modernizing our GUC infrastructure
Date: 2022-09-06 02:45:47
Message-ID: CAEG8a3+Qf+iaqdtZoLqaLv7i1HYtR0ofn_6vhWDg4ciVhrJvLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

@@ -5836,74 +5865,106 @@ build_guc_variables(void)
}

/*
- * Create table with 20% slack
+ * Create hash table with 20% slack
*/
size_vars = num_vars + num_vars / 4;

Should we change 20% to 25%, I thought that might be
a typo.

On Tue, Sep 6, 2022 at 6:28 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Attached is a patch series that attempts to modernize our GUC
> infrastructure, in particular removing the performance bottlenecks
> it has when there are lots of GUC variables. I wrote this because
> I am starting to question the schema-variables patch [1] --- that's
> getting to be quite a large patch and I grow less and less sure
> that it's solving a problem our users want solved. I think what
> people actually want is better support of the existing mechanism
> for ad-hoc session variables, namely abusing custom GUCs for that
> purpose. One of the big reasons we have been resistant to formally
> supporting that is fear of the poor performance guc.c would have
> with lots of variables. But we already have quite a lot of them:
>
> regression=# select count(*) from pg_settings;
> count
> -------
> 353
> (1 row)
>
> and more are getting added all the time. I think this patch series
> could likely be justified just in terms of positive effect on core
> performance, never mind user-added GUCs.
>
> 0001 and 0002 below are concerned with converting guc.c to store its
> data in a dedicated memory context (GUCMemoryContext) instead of using
> raw malloc(). This is not directly a performance improvement, and
> I'm prepared to drop the idea if there's a lot of pushback, but I
> think it would be a good thing to do. The only hard reason for using
> malloc() there was the lack of ability to avoid throwing elog(ERROR)
> on out-of-memory in palloc(). But mcxt.c grew that ability years ago.
> Switching to a dedicated context would greatly improve visibility and
> accountability of GUC-related allocations. Also, the 0003 patch will
> switch guc.c to relying on a palloc-based hashtable, and it seems a
> bit silly to have part of the data structure in palloc and part in
> malloc. However 0002 is a bit invasive, in that it forces code
> changes in GUC check callbacks, if they want to reallocate the new
> value or create an "extra" data structure. My feeling is that not
> enough external modules use those facilities for this to pose a big
> problem. However, the ones that are subject to it will have a
> non-fun time tracking down why their code is crashing. (The recent
> context-header changes mean that you get a very obscure failure when
> trying to pfree() a malloc'd chunk -- for me, that typically ends
> in an assertion failure in generation.c. Can we make that less
> confusing?)
>
> 0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure
> with a dynahash hash table. (I also looked at simplehash, but it
> has no support for not elog'ing on OOM, and it increases the size
> of guc.o by 10KB or so.) This fixes the worse-than-O(N^2) time
> needed to create N new GUCs, as in
>
> do $$
> begin
> for i in 1..10000 loop
> perform set_config('foo.bar' || i::text, i::text, false);
> end loop;
> end $$;
>
> On my machine, this takes about 4700 ms in HEAD, versus 23 ms
> with this patch. However, the places that were linearly scanning
> the array now need to use hash_seq_search, so some other things
> like transaction shutdown (AtEOXact_GUC) get slower.
>
> To address that, 0004 adds some auxiliary lists that link together
> just the variables that are interesting for specific purposes.
> This is helpful even without considering the possibility of a
> lot of user-added GUCs: in a typical session, for example, not
> many of those 353 GUCs have non-default values, and even fewer
> get modified in any one transaction (typically, anyway).
>
> As an example of the speedup from 0004, these DO loops:
>
> create or replace function func_with_set(int) returns int
> strict immutable language plpgsql as
> $$ begin return $1; end $$
> set enable_seqscan = false;
>
> do $$
> begin
> for i in 1..100000 loop
> perform func_with_set(i);
> end loop;
> end $$;
>
> do $$
> begin
> for i in 1..100000 loop
> begin
> perform func_with_set(i);
> exception when others then raise;
> end;
> end loop;
> end $$;
>
> take about 260 and 320 ms respectively for me, in HEAD with
> just the stock set of variables. But after creating 10000
> GUCs with the previous DO loop, they're up to about 3200 ms.
> 0004 brings that back down to being indistinguishable from the
> speed with few GUCs.
>
> So I think this is good cleanup in its own right, plus it
> removes one major objection to considering user-defined GUCs
> as a supported feature.
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4%3D6SvPe7ke6xPbh%3DK50LUAOwjC3jm8Me9Obg%40mail.gmail.com
>

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-06 02:48:37 Re: Modernizing our GUC infrastructure
Previous Message Tom Lane 2022-09-06 02:43:14 Re: Reducing the chunk header sizes on all memory context types