Re: Allow non-superuser to cancel superuser tasks.

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Leung, Anthony" <antholeu(at)amazon(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow non-superuser to cancel superuser tasks.
Date: 2024-04-11 11:55:59
Message-ID: CALdSSPiOLADNe1ZS-1dnQXWVXgGAC6=3TBKABu9C3_97YGOoMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Posting updated version of this patch with comments above addressed.

1) pg_signal_autovacuum -> pg_signal_autovacuum_worker, as there seems
to be no objections to that.

2)
There are comments on how to write if statement:

> In core, do_autovacuum() is only called in a process without
> a role specified

> It's feeling more natural here to check that we have a superuser-owned
> backend first, and then do a lookup of the process type.

> I figured since there's no reason to rely on that behavior, we might as
> well do a bit of future-proofing in case autovacuum workers are ever not
> run as InvalidOid.

I have combined them into this:

if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
&& pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)

This is both future-proofing and natural, I suppose. Downside of this
is double checking condition (!OidIsValid(proc->roleId) ||
superuser_arg(proc->roleId)), but i think that is ok for the sake of
simplicity.

3) pg_signal_autovacuum_worker Oid changed to random one: 8916

4)

> An invalid BackendType is not false, but B_INVALID.
fixed, thanks

5)

>>>> There is pg_read_all_stats as well, so I don't see a big issue in
>>>> requiring to be a member of this role as well for the sake of what's
>>>> proposing here.
>>>
>>> Well, that tells you quite a bit more than just which PIDs correspond to
>>> autovacuum workers, but maybe that's good enough for now.
>>
>> That may be a good initial compromise, for now.

>Sounds good to me. I will update the documentation.

@Anthony if you feel that documentation update adds much value here,
please do. Given that we know autovacuum worker PIDs from
pg_stat_progress_vacuum, I don't know how to reflect something about
pg_stat_autovac_worker in doc, and if it is worth it.

6)
> + INJECTION_POINT("autovacuum-start");
> Perhaps autovacuum-worker-start is more suited here

fixed, thanks

7)

> +# Copyright (c) 2022-2024, PostgreSQL Global Development Group
> [...]
> +# Copyright (c) 2024-2024, PostgreSQL Global Development Group

> These need to be cleaned up.

> +# Makefile for src/test/recovery
> +#
> +# src/test/recovery/Makefile

> This is incorrect, twice.

Cleaned up, thanks!

8)

> Not sure that there is a huge point in checking after a role that
> holds pg_signal_backend.
Ok. Removed.

Then:

> +like($psql_err, qr/ERROR: permission denied to terminate ...
> Checking only the ERRROR, and not the DETAIL should be sufficient
> here.

After removing the pg_signal_backend test case we have only one place
where errors check is done. So, I think we should keep DETAIL here to
ensure detail is correct (it differs from regular backend case).

9)
> +# Test signaling for pg_signal_autovacuum role.
> This needs a better documentation:

Updated. Hope now the test documentation helps to understand it.

10)

> +ok($terminate_with_pg_signal_av == 0, "Terminating autovacuum worker should succeed with pg_signal_autovacuum role");
> Is that enough for the validation?

Added:
ok($node->log_contains(qr/FATAL: terminating autovacuum process due to
administrator command/, $offset),
"Autovacuum terminates when role is granted with pg_signal_autovacuum_worker");

11) references to `passcheck` extension removed. errors messages rephrased.

12) injection_point_detach added.

13)
> + INSERT INTO tab_int SELECT * FROM generate_series(1, 1000000);
> A good chunk of the test would be spent on that, but you don't need
> that many tuples to trigger an autovacuum worker as the short naptime
> is able to do it. I would recommend to reduce that to a minimum.

+1
Single tuple works.

14)

v3 suffers from segfault:
2024-04-11 11:28:31.116 UTC [147437] 001_signal_autovacuum.pl LOG:
statement: SELECT pg_terminate_backend(147427);
2024-04-11 11:28:31.116 UTC [147427] FATAL: terminating autovacuum
process due to administrator command
2024-04-11 11:28:31.116 UTC [147410] LOG: server process (PID 147427)
was terminated by signal 11: Segmentation fault
2024-04-11 11:28:31.116 UTC [147410] LOG: terminating any other
active server processes
2024-04-11 11:28:31.117 UTC [147410] LOG: shutting down because
restart_after_crash is off
2024-04-11 11:28:31.121 UTC [147410] LOG: database system is shut down

The test doesn't fail because pg_terminate_backend actually meets his
point: autovac is killed. But while dying, autovac also receives
segfault. Thats because of injections points:

(gdb) bt
#0 0x000056361c3379ea in tas (lock=0x7fbcb9632224 <error: Cannot
access memory at address 0x7fbcb9632224>) at
../../../../src/include/storage/s_lock.h:228
#1 ConditionVariableCancelSleep () at condition_variable.c:238
#2 0x000056361c337e4b in ConditionVariableBroadcast
(cv=0x7fbcb66f498c) at condition_variable.c:310
#3 0x000056361c330a40 in CleanupProcSignalState (status=<optimized
out>, arg=<optimized out>) at procsignal.c:240
#4 0x000056361c328801 in shmem_exit (code=code(at)entry=1) at ipc.c:276
#5 0x000056361c3288fc in proc_exit_prepare (code=code(at)entry=1) at ipc.c:198
#6 0x000056361c3289bf in proc_exit (code=code(at)entry=1) at ipc.c:111
#7 0x000056361c49ffa8 in errfinish (filename=<optimized out>,
lineno=<optimized out>, funcname=0x56361c654370 <__func__.16>
"ProcessInterrupts") at elog.c:592
#8 0x000056361bf7191e in ProcessInterrupts () at postgres.c:3264
#9 0x000056361c3378d7 in ConditionVariableTimedSleep
(cv=0x7fbcb9632224, timeout=timeout(at)entry=-1,
wait_event_info=117440513) at condition_variable.c:196
#10 0x000056361c337d0b in ConditionVariableTimedSleep
(wait_event_info=<optimized out>, timeout=-1, cv=<optimized out>) at
condition_variable.c:135
#11 ConditionVariableSleep (cv=<optimized out>,
wait_event_info=<optimized out>) at condition_variable.c:98
#12 0x00000000b96347d0 in ?? ()
#13 0x3a3f1d9baa4f5500 in ?? ()
#14 0x000056361cc6cbd0 in ?? ()
#15 0x000056361ccac300 in ?? ()
#16 0x000056361c62be63 in ?? ()
#17 0x00007fbcb96347d0 in ?? () at injection_points.c:201 from
/home/reshke/postgres/tmp_install/home/reshke/postgres/pgbin/lib/injection_points.so
#18 0x00007fffe4122b10 in ?? ()
#19 0x00007fffe4122b70 in ?? ()
#20 0x0000000000000000 in ?? ()

discovered because of
# Release injection point.
$node->safe_psql('postgres',
"SELECT injection_point_detach('autovacuum-worker-start');");
added

v4 also suffers from that. i will try to fix that.

Attachment Content-Type Size
v4-0001-Introduce-pg_signal_autovacuum-role-to-allow-non-.patch application/octet-stream 6.7 KB
v4-0002-Add-tap-test-for-pg_signal_autovacuum-role.patch application/octet-stream 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-04-11 12:00:00 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message Zhijie Hou (Fujitsu) 2024-04-11 11:33:55 RE: Synchronizing slots from primary to standby