Re: catalog access with reset GUCs during parallel worker startup

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: catalog access with reset GUCs during parallel worker startup
Date: 2022-02-23 02:51:19
Message-ID: 20220223025119.7aj3tpzwsas5ayvm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I think we need to do something about this, because afaics this has data
corruption risks, albeit with a narrow window.

E.g. consider what happens if the value of wal_sync_method is reset to the
boot value and one of the catalog accesses in GUC check hooks causes WAL
writes (e.g. due to pruning, buffer replacement). One backend doing
fdatasync() and the rest O_DSYNC or vice versa won't work lead to reliable
durability.

On 2022-02-09 21:47:09 -0500, Robert Haas wrote:
> I remember trying that, and if I remember correctly, it broke with
> core GUCs, without any need for extensions in the picture. I don't
> remember the details too well, unfortunately, but I think it had to do
> with the behavior of some of the check and/or assign hooks.
>
> It's probably time to try it again, because (a) maybe things are
> different now, or (b) maybe I did it wrong, and in any event (c) I
> can't really remember why it didn't work and we probably want to know
> that.

The tests fail:
+ERROR: invalid value for parameter "session_authorization": "andres"
+CONTEXT: while setting parameter "session_authorization" to "andres"
+parallel worker
+while scanning relation "public.pvactst"

but that's easily worked around. In fact, there's already code to do so, it
just is lacking awareness of session_authorization:

static bool
can_skip_gucvar(struct config_generic *gconf)
...
* Role must be handled specially because its current value can be an
* invalid value (for instance, if someone dropped the role since we set
* it). So if we tried to serialize it normally, we might get a failure.
* We skip it here, and use another mechanism to ensure the worker has the
* right value.

Don't those reasons apply just as well to session_authorization as they do to
role?

If I skip session_authorization in can_skip_gucvar() and move
RestoreGUCState() out of the transaction, all tests pass.

Unsurprisingly we get bogus results for parallel accesses to
session_authorization:

postgres[3312622][1]=> SELECT current_setting('session_authorization');
┌─────────────────┐
│ current_setting │
├─────────────────┤
│ frak │
└─────────────────┘
(1 row)

postgres[3312622][1]=> SET force_parallel_mode = on;
SET
postgres[3312622][1]=> SELECT current_setting('session_authorization');
┌─────────────────┐
│ current_setting │
├─────────────────┤
│ andres │
└─────────────────┘
(1 row)

but that could be remedied just already done for role.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2022-02-23 02:58:07 Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Previous Message Masahiko Sawada 2022-02-23 02:46:14 Re: Design of pg_stat_subscription_workers vs pgstats