Re: Printing backtrace of postgres processes

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Printing backtrace of postgres processes
Date: 2021-02-03 06:41:15
Message-ID: CALDaNm3aBt1hw+Q=achUxfQgyu3b6DmdxewvMVx5DEikNayAFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Bharath for your comments.

On Mon, Feb 1, 2021 at 6:14 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Jan 29, 2021 at 7:10 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > 4) How about following
> > > + errmsg("must be a superuser to print backtrace
> > > of backend process")));
> > > instead of
> > > + errmsg("must be a superuser to print backtrace
> > > of superuser query process")));
> > >
> >
> > Here the message should include superuser, we cannot remove it. Non
> > super user can log non super user provided if user has permissions for
> > it.
> >
> > > 5) How about following
> > > errmsg("must be a member of the role whose backed
> > > process's backtrace is being printed or member of
> > > pg_signal_backend")));
> > > instead of
> > > + errmsg("must be a member of the role whose
> > > backtrace is being logged or member of pg_signal_backend")));
> > >
> >
> > Modified it.
>
> Maybe I'm confused here to understand the difference between
> SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
> corresponding error messages. Some clarification/use case to know in
> which scenarios we hit those error messages might help me. Did we try
> to add test cases that show up these error messages for
> pg_print_backtrace? If not, can we add?

I have tested this manually:

I have tested it manually, Here is the test I did:
Create 2 users:
create user test password 'test(at)123';
create user test1 password 'test(at)123';

Test1: Test print backtrace of a different user's session:
./psql -d postgres -U test
psql (14devel)
Type "help" for help.
postgres=> select pg_backend_pid();
pg_backend_pid
----------------
30142
(1 row)
------------------------------------------
./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30142);
ERROR: must be a member of the role whose backtrace is being logged
or member of pg_signal_backend

The above will be successful after:
grant pg_signal_backend to test1;

Test1: Test for non super user trying to print backtrace of a super
user's session:
./psql -d postgres
psql (14devel)
Type "help" for help.
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
30211
(1 row)

./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30211);
ERROR: must be a superuser to print backtrace of superuser process
I have not added any tests for this as we required 2 active sessions
and I did not see any existing framework for this.
This test should help in relating the behavior.

> > Attached v5 patch has the fixes for the same.
> > Thoughts?
>
> Thanks. Here are some comments on v5 patch:
>
> 1) typo - it's "process" not "porcess" + a backend porcess. For example:
>

Modified.

> 2) select * from pg_print_backtrace(NULL);
> postgres=# select proname, proisstrict from pg_proc where proname =
> 'pg_print_backtrace';
> proname | proisstrict
> --------------------+-------------
> pg_print_backtrace | t
>
> See the documentation:
> "proisstrict bool
>
> Function returns null if any call argument is null. In that case the
> function won't actually be called at all. Functions that are not
> “strict” must be prepared to handle null inputs."
> So below PG_ARGISNUL check is not needed, you can remove that, because
> pg_print_backtrace will not get called with null input.
> int bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
>

Modified.

> 3) Can we just set results = true instead of PG_RETURN_BOOL(true); so
> that it will be returned from PG_RETURN_BOOL(result); just for
> consistency?
> if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT,
> InvalidBackendId))
> PG_RETURN_BOOL(true);
> else
> ereport(WARNING,
> (errmsg("failed to send signal to postmaster: %m")));
> }
>
> PG_RETURN_BOOL(result);
>

I felt existing is better as it will reduce one instruction of setting
first and then returning. There are only 2 places from where we
return. I felt we could directly return true or false.

> 4) Below is what happens if I request for a backtrace of the
> postmaster process? 1388210 is pid of postmaster.
> postgres=# select * from pg_print_backtrace(1388210);
> WARNING: PID 1388210 is not a PostgreSQL server process
> pg_print_backtrace
> --------------------
> f
>
> Does it make sense to have a postmaster's backtrace? If yes, can we
> have something like below in sigusr1_handler()?
> if (CheckPostmasterSignal(PMSIGNAL_BACKTRACE_EMIT))
> {
> LogBackTrace();
> }
>

We had a discussion about this in [1] earlier and felt including this
is not very useful.

> 5) Can we have PROCSIG_PRINT_BACKTRACE instead of
> PROCSIG_BACKTRACE_PRINT, just for readability and consistency with
> function name?
>

PROCSIG_PRINT_BACKTRACE is better, I have modified it.

> 6) I think it's not the postmaster that prints backtrace of the
> requested backend and we don't send SIGUSR1 with
> PMSIGNAL_BACKTRACE_EMIT to postmaster, I think it's the backends, upon
> receiving SIGUSR1 with PMSIGNAL_BACKTRACE_EMIT signal, log their own
> backtrace. Am I missing anything here? If I'm correct, we need to
> change the below description and other places wherever we refer to
> this description.
>
> The idea here is to implement & expose pg_print_backtrace function, internally
> what this function does is, the connected backend will send SIGUSR1 signal by
> setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Once the process
> receives this signal it will log the backtrace of the process.
>

Modified.

> 7) Can we get the parallel worker's backtrace? IIUC it's possible.
>

Yes we can get the parallel worker's backtrace. As this design is
driven based on CHECK_FOR_INTERRUPTS, it will be printed when
CHECK_FOR_INTERRUPTS is called.

> 8) What happens if a user runs pg_print_backtrace() on Windows or
> MacOS or some other platform? Do you want to say something about other
> platforms where gdb/addr2line doesn't exist?
> +</programlisting>
> + You can get the file name and line number by using gdb/addr2line in
> + linux platforms, as a prerequisite users must ensure gdb/addr2line is
> + already installed:
> +<programlisting>
>

I have added the following:
This feature will be available, if the installer was generated on a
platform which had backtrace capturing capability. If not available it
will return false by throwing the following warning "WARNING:
backtrace generation is not supported by this installation"

> 9) Can't we reuse set_backtrace with just adding a flag to
> set_backtrace(ErrorData *edata, int num_skip, bool
> is_print_backtrace_function), making it a non-static function and call
> set_backtrace(NULL, 0, true)?
>
> void
> set_backtrace(ErrorData *edata, int num_skip, bool is_print_backtrace_function)
> {
> StringInfoData errtrace;
> -------
> -------
> if (is_print_backtrace_function)
> elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> else
> edata->backtrace = errtrace.data;
> }
>
> I think it will be good if we do this, because we can avoid duplicate
> code in set_backtrace and LogBackTrace.
>

Yes it is better, I have modified it to use the same function.

> 10) I think it's "pg_signal_backend" instead of "pg_print_backtrace"?
> + backtrace is being printed or the calling role has been granted
> + <literal>pg_print_backtrace</literal>, however only superusers can
>

Modified.

> 11) In the documentation added, isn't it good if we talk a bit about
> in which scenarios users can use this function? For instance,
> something like "Use pg_print_backtrace to know exactly where it's
> currently waiting when a backend process hangs."?
>

Modified to include more details.

[1] - https:://www.postgresql.org/message-id/1778088.1606026308%40sss.pgh.pa.us
Attached v6 patch with the fixes.
Thoughts?

Regards,
Vignesh

Attachment Content-Type Size
v6-0001-Print-backtrace-of-specified-postgres-process.patch text/x-patch 21.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-02-03 06:51:53 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message Michael Paquier 2021-02-03 06:37:39 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly