Re: RFC: Logging plan of the running query

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Étienne BERSAC <etienne(dot)bersac(at)dalibo(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, jtc331(at)gmail(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, rafaelthca(at)gmail(dot)com
Subject: Re: RFC: Logging plan of the running query
Date: 2024-02-13 02:30:53
Message-ID: 3d5fd6d955944b5e8e4b8d1917f14885@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-02-12 09:00, jian he wrote:

Thanks for you comments.

> On Mon, Jan 29, 2024 at 9:02 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> wrote:
>>
>> Hi,
>>
>> Updated the patch to fix typos and move
>> ProcessLogQueryPlanInterruptActive from errfinish() to
>> AbortTransaction.
>>
>
> + <row>
> + <entry role="func_table_entry"><para role="func_signature">
> + <indexterm>
> + <primary>pg_log_query_plan</primary>
> + </indexterm>
> + <function>pg_log_query_plan</function> (
> <parameter>pid</parameter> <type>integer</type> )
> + <returnvalue>boolean</returnvalue>
> + </para>
> + <para>
> + Requests to log the plan of the query currently running on the
> + backend with specified process ID.
> + It will be logged at <literal>LOG</literal> message level and
> + will appear in the server log based on the log
> + configuration set (See <xref
> linkend="runtime-config-logging"/>
> + for more information), but will not be sent to the client
> + regardless of <xref linkend="guc-client-min-messages"/>.
> + </para></entry>
> + </row>
> it would be better to explain the meaning of return value TRUE/FALSE?

Yeah, but I've noticed that this should be located in 'Table Server
Signaling Functions' not 'Table Control Data Functions'.
Since 'Table Server Signaling Functions' describes the return code as
below, just relocation seems fine.

Each of these functions returns true if the signal was successfully
sent and false if sending the signal failed.

> +# logging plan of the running query on the specified backend
> +{ oid => '8000', descr => 'log plan of the running query on the
> specified backend',
> + proname => 'pg_log_query_plan',
> + provolatile => 'v', prorettype => 'bool',
> you can add
> `proargnames => '{pid}'`

Hmm, pg_log_query_plan() can take one argument, I'm not sure how much
sense it makes.
Other functions which take one argument such as pg_cancel_backend() does
not have proargnames.

> + if (proc == NULL)
> + {
> + /*
> + * This is just a warning so a loop-through-resultset will not abort
> + * if one backend terminated on its own during the run.
> + */
> + ereport(WARNING,
> + (errmsg("PID %d is not a PostgreSQL backend process", pid)));
> + PG_RETURN_BOOL(false);
> + }
> +
> + be_status = pgstat_get_beentry_by_backend_id(proc->backendId);
> + if (be_status->st_backendType != B_BACKEND)
> + {
> + ereport(WARNING,
> + (errmsg("PID %d is not a PostgreSQL client backend process", pid)));
> + PG_RETURN_BOOL(false);
> + }
> +
> + if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0)
> + {
> + /* Again, just a warning to allow loops */
> + ereport(WARNING,
> + (errmsg("could not send signal to process %d: %m", pid)));
> + PG_RETURN_BOOL(false);
> + }
>
> I found out that pg_log_query_plan's comments look like
> pg_log_backend_memory_contexts.
> pg_log_backend_memory_contexts will iterate through many memory
> contexts.
> but pg_log_query_plan for one specific pid will only output one plan?
> so I am a little bit confused by the comments.

These "loop" mean backend can run pg_log_query_plan() repeatedly even
when failing sending signals.
pg_signal_backend() also have such comments.

> + /*
> + * Ensure no page lock is held on this process.
> + *
> + * If page lock is held at the time of the interrupt, we can't acquire
> any
> + * other heavyweight lock, which might be necessary for explaining the
> plan
> + * when retrieving column names.
> + *
> + * This may be overkill, but since page locks are held for a short
> duration
> + * we check all the LocalLock entries and when finding even one, give
> up
> + * logging the plan.
> + */
> + hash_seq_init(&status, GetLockMethodLocalHash());
> + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
> + {
> + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
> maybe not that self evident, the above comments still not explained
> why we need to ensure only
> PAGE lock was held on this process?

This is for preventing assertion error and it seems not necessary
anymore as described in [1].
I'm going remove them.

>
> In the commit message, can you add all the discussion links?
> My gmail account doesn't have a previous discussion history.

Sure.

> I am not sure this
> (https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru)
> is the only discussion link?

This is the original one:
https://www.postgresql.org/message-id/cf8501bcd95ba4d727cbba886ba9eea8%40oss.nttdata.com

> I found a bug:
> src8=# select *, pg_sleep(10) from tenk1 for update;
> 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on
> backend with PID 48602 is:
> Query Text: select *, pg_sleep(10) from tenk1 for update;
> LockRows (cost=0.00..570.00 rows=10000 width=254)
> Output: unique1, unique2, two, four, ten, twenty, hundred,
> thousand, twothousand, fivethous, tenthous, odd, even, stringu1,
> stringu2, string4, (pg_sleep('10'::double precision)), ctid
> -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=10000
> width=254)
> Output: unique1, unique2, two, four, ten, twenty,
> hundred, thousand, twothousand, fivethous, tenthous, odd, even,
> stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid
>
> another session (PID) executes `SELECT pg_log_query_plan(48602);` in
> the meantime.
> pg_log_query_plan returns true successfully, but PID 48602 was stuck.

Hmm, it's not simply sleeping, is it?
I'm concerned a bit this because estimated rows of tenk1 is 10000.
If so, the query will take 10000 * 10 seconds.

> I have problem using git apply:
> error: patch failed: src/include/commands/explain.h:94
> error: src/include/commands/explain.h: patch does not apply
>
> `patch -p1 < /v35-0001-Add-function-to-log-the-plan-of-the-query.patch`
> works

I'll update the patch including other points such as removing ensuring
no page lock code.

[1]
https://www.postgresql.org/message-id/1b2b247530f3ff3afab4ddc2df222e8b%40oss.nttdata.com

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-02-13 03:19:25 Re: Documentation to upgrade logical replication cluster
Previous Message Thomas Munro 2024-02-13 02:07:17 Re: [PATCH] Add native windows on arm64 support