Re: RFC: Logging plan of the running query

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: torikoshia(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: RFC: Logging plan of the running query
Date: 2022-02-01 08:27:57
Message-ID: 20220201.172757.1480996662235658750.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 1 Feb 2022 01:51:11 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2022/01/28 17:45, torikoshia wrote:
> >> There is the case where the request to log a query plan is skipped
> >> even while the target backend is running a query. If this happens,
> >> users can just retry pg_log_query_plan(). These things should be
> >> documented?
> > Agreed.
> > Added following:
> >   +        Note that there is the case where the request to log a
> > query
> >   +        plan is skipped even while the target backend is running a
> >   +        query due to lock conflict avoidance.
> >   +        If this happens, users can just retry pg_log_query_plan().
>
> This may cause users to misunderstand that pg_log_query_plan() fails
> while the target backend is holding *any* locks? Isn't it better to
> mention "page-level locks", instead? So how about the following?
>
> --------------------------
> Note that the request to log the query plan will be ignored if it's
> received during a short period while the target backend is holding a
> page-level lock.
> --------------------------
>
>
> >> +            ereport(LOG_SERVER_ONLY,
> >> +                errmsg("backend with PID %d is holding a page
> >> lock. Try again",
> >> +                    MyProcPid));
> >>
> >> It seems more proper to output this message in DETAIL or HINT,
> >> instead. So how about something like the following messages?
> >>
> >> LOG: could not log the query plan
> >> DETAIL: query plan cannot be logged while page level lock is being
> >> held
> >> HINT: Try pg_log_query_plan() after a few ....
> > Agreed.
> > I felt the HINT message 'after a few ...' is difficult to describe,
> > and wrote as below.
> > | HINT: Retrying pg_log_query_plan() might succeed since the lock
> > | duration of page level locks are usually short
> > How do you think?
>
> Or we don't need HINT message?
>
>
> + errmsg("could not log the query plan"),
> + errdetail("query plan cannot be logged while page level lock is
> being held"),
>
> In detail message, the first word of sentences should be
> capitalized. How about "Cannot log the query plan while holding
> page-level lock.", instead?
>
>
> Thanks for updating the patch! Here are some review comments.
>
> + <row>
> + <entry role="func_table_entry"><para role="func_signature">
> + <indexterm>
> + <primary>pg_log_query_plan</primary>
> + </indexterm>
>
> This entry is placed before one for
> pg_log_backend_memory_contexts(). But it should be *after* that since
> those entries seem to be placed in alphabetical order in the table?
>
>
> + Requests to log the plan of the query currently running on the
> + backend with specified process ID along with the untruncated
> + query string.
>
> Other descriptions about logging of query string seem not to mention
> something like "untruncated query string". For example, auto_explain,
> log_statement, etc. Why do we need to mention "along with the
> untruncated query string" specially for pg_log_query_plan()?
>
>
> + Note that nested statements (statements executed inside a function)
> are not
> + considered for logging. Only the plan of the most deeply nested
> query is logged.
>
> Now the plan of even nested statement can be logged. So this
> description needs to be updated?
>
>
> @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
> MemoryContextSwitchTo(oldcontext);
> + ActiveQueryDesc = NULL;
>
> ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be
> reset to NULL in standard_ExecutorFinish()?
>
>
> Currently even during ProcessLogQueryPlanInterrupt(),
> pg_log_query_plan() can be call and another
> ProcessLogQueryPlanInterrupt() can be executed. So repeatable
> re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack
> depth limit exceeded" error. To avoid this, shouldn't we make
> ProcessLogQueryPlanInterrupt() do nothing and return immediately, if
> it's called during another ProcessLogQueryPlanInterrupt()?
>
> pg_log_backend_memory_contexts() also might have the same issue.

Good catch.

By the way, I'm anxious about the following part and I'd like to
remove it.

+ * Ensure no page lock is held on this process.

It seems to me what is wrong is ginInsertCleanup(), not this feature.
As I read the comment for the assertion, I don't believe we want to
allow CFI while holding a page lock. And AFAICS the function is the
only point where doing that. (It is the alone user of LockPage()...)

This is the assertion.

lock.c: 902
> /*
> * We don't acquire any other heavyweight lock while holding the page lock
> * except for relation extension.
> */
> Assert(!IsPageLockHeld ||
> (locktag->locktag_type == LOCKTAG_RELATION_EXTEND));

It is added by a recent commit 72e78d831ab5550c39f2dcc7cc5d44c406ec3dc2.

> * Similar to relation extension, page locks are also held for a short
> * duration, so imposing such a restriction won't hurt.

I don't believe a path involving vacuum_delay_point() calls is
short-duration'ed.

https://www.postgresql.org/message-id/CAH2-WznzCPUKnOV%2Bre30_rHLCkqQWm2JTSVjTCAei9LySTc2pw%40mail.gmail.com

> One thing that really bothers me about commit e2c79e14 is that
> LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
> before that commit, and is now the only code in the entire system that
> calls LockPage()/ConditionalLockPage() (the hash am no longer uses
> page heavyweight locks following recent work there).

I agree to the discussion. Can't we use other mechanism here to get
rid of the Lockpage()?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-02-01 08:38:01 Re: Make mesage at end-of-recovery less scary.
Previous Message Greg Nancarrow 2022-02-01 07:04:29 Re: row filtering for logical replication